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ABSTRACT 

Weak gravitational lensing has been used extensively in the past decade to constrain 
the masses of galaxy clusters, and is the most promising observational technique for 
providing the mass calibration necessary for precision cosmology with clusters. There 
are several challenges in estimating cluster masses, particularly (a) the sensitivity 
to astrophysical effects and observational systematics that modify the signal relative 
to the theoretical expectations, and (b) biases that can arise due to assumptions in 
the mass estimation method, such as the assumed radial profile of the cluster. All of 
these challenges are more problematic in the inner regions of the cluster, suggesting 
that their influence would ideally be suppressed for the purpose of mass estimation. 
However, at any given radius the differential surface density measured by lensing 
is sensitive to all mass within that radius, and the corrupted signal from the inner 
parts is spread out to all scales. We develop a new statistic Y(i?;7?o) that is ideal 
for estimation of cluster masses because it completely eliminates mass contributions 
below a chosen scale (which we suggest should be about 20 per cent of the virial 
radius), and thus reduces sensitivity to systematic and astrophysical effects. We use 
simulated and analytical profiles including shape noise to quantify systematic biases 
on the estimated masses for several standard methods of mass estimation, finding that 
these can lead to significant mass biases that range from ten to over fifty per cent. 
The mass uncertainties when using the new statistic T(i?; R ) are reduced by up to a 
factor of ten relative to the standard methods, while only moderately increasing the 
statistical errors. This new method of mass estimation will enable a higher level of 
precision in future science work with weak lensing mass estimates for galaxy clusters. 
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d ; 1 INTRODUCTION 

Many scientific applications require robust measurements of 
the mass in galaxy clusters. One such application is the use 
of the dark matter halo mass function to constrain cosmo- 
logical model parameters, including the amplitude of matter 
density perturbations, the average matter density, and even 
the equation of state of dark energy (e.g., most recently , 



Rincs et al 



Rozo et al 



120071 ; iMantz et all 12008 : IVikhlinin etafl 120091: 
20101 ). Another example is validation and refine- 



ment of models of cluster formation and evolution, which 
predict relations between the more easily measured optical 
and X-ray emission, and the underlying dark matter halo 
jKravtsov et al.ll2006l: iNaeai et al.1l2007l : IZhang et ai1l20Qg| ; 
iBorgani fc Kravtsovl 120091 '). Currently, there are thousands 
of known clusters selected in various ways that can be used 
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for these applications. Future surveys such as the Dark En- 
ergy Survey (DESfl, Pan-STARRSj and the Large Synop- 
tic Survey Telescope (LSST0 will provide even larger and 
deeper samples that can be used for this purpose, requir- 
ing greater systematic robustness in the mass measures to 
complement the smaller statistical errors. 

Many different methods have been used to measure 
the halo profile of clusters and thereby estimate their 
masses. Kinematic tracers such as satellite galaxies, in 
combination with a Jeans analysis or caustics analysis, 
can give information over a wide range of physical scales 
and halo masses. While the issues of relaxation, velocity 
bias, anisotropy of the orbits and interlopers need to be 
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carefully addressed, recent results suggest a good agree- 
ment with t heoretical predictions for the form of the den- 
sity profile teiviano fc Girardil 120031; iKatgert et all 12004 
Rines et al.1 I2OO3I; iDiaferio et al.1 |2005| ~ iRines fc Diaferiq 



2006; Salucci et alj |2007). Hydrostatic analyses of X-ray in- 
tensity profiles of clusters use X-ray intensity and tempera- 
ture as a function of radius to reconstruct the density pro- 
file and estimate a halo mass. The advantage of thermal gas 
pressure being isotropic in partially lost due to the possible 
presence of other sources of pressure support, such as tur- 
bulence, cosmic rays or magnetic fields. These extra sources 
of pressure support cannot be strong ly constrained for typ i- 
cal clusters with present X-ray data l|Schuecker et al.ll2004 ) , 
but could modify the hydrostatic equilibrium and affect the 
conclusions of such analyses. Recent results are encourag- 
ing and are in a broad agreement with predictions, although 
most require concentrations that are higher than those pre- 
dicted by a concordance cosmology l|Vikhlinin et al.l 120061 ; 
iBuote et al.ll2007l ; ISchmidt fc AllenllioOTf l . While the above- 
mentioned systematic biases cannot be excluded, the small 
discrepancy could also be due to baryonic effects in the cen- 
tral regions, due to selection of rel axed clusters that ma y be 
more concentrated than average (|Vikhlinin et al.ll2006h . or 
due to the fact that at a given X-ray flux limit, the more 
concentrated clusters near the limiting mass are more likely 
to be included in the sample l|Fedeli et al.ll2007T ). 

Gravitational lensing is by definition sensitive to the to- 
tal mass, and is therefore one of the most promising meth- 
ods to measure the mass profile independent of the dy- 
namical state of the clusters. Many previous weak lensing 
analyses have f ocused on individual clusters (for example , 



Hockstra 2007; jPedersen fc Dahlej|2007l ; I Abate et al.l 120091 ; 



Okabe et al.ll2009f ). Measuring the matter distribution of in- 



dividual clusters allows a comparison with the combined 
baryonic (light and gas) distribution on an individual ba- 
sis, and so can constrai n models that rela te the two, such 
as MOND versus CDM |Clowe et alj|2006h . However, these 
measurements can be quite noisy for individual clusters. 
Stacking the signal from many clusters can ameliorate this 
problem, since shape noise and the signal due to correlated 
structures will be averaged out. Such a statistical approach 
is thus advantageous if one is to compare the observations to 
theoretical predictions, which also average over a large num- 
ber of halos in simulations. A final advantage of stacking is 
that it allows for the lensing measurement of lower-mass 
halos, where individual detection is impossible due to their 
lower shears relative to more massive clusters. Individual 
high signal-to-noise cluster observations and those based on 
stacked analysis of many clusters are thus complementary to 
each other at the high mass end, with the stacked analysis 
drastically increasing the available baseline in mass. 

Extraction of cluster dark matter halo masses from the 
weak lensing signal is subject to a number of uncertainties, 
which we discuss in this paper in detail, including the ways 
that the uncertainties differ for individual versus stacked 
cluster lensing analyses. In brief, these uncertainties are: (i) 
biased calibration of the lensing signal; (ii) modification of 
the lensing profile in the inner cluster regions due to acciden- 
tal inclusion of cluster member galaxies in the source sam- 
ple, intrinsic alignments of those galaxies, non-weak shear, 
magnification, baryonic effects that modify the initial clus- 
ter dark matter halo density profile, and cluster centroiding 



errors; (iii) contributions to the lensing signal from nonviri- 
alized local structures and large-scale structure (LSS). Fur- 
thermore, parametric modeling of the mass requires the as- 
sumption of a form for the dark matter halo profile, which 
may differ from the intrinsic profile and/or have poorly con- 
strained parameters. Non-parametric modeling, while not 
subject to this weakness, results in projected masses that 
must be converted to three-dimensional enclosed masses to 
be compared against the theory predictions, all of which are 
currently phrased in terms of 3d masses. We quantify the de- 
gree to which this conversion depends on assumptions about 
the density profile. Generally, we show the effects of many 
of these uncertainties on the estimated masses from cluster 
weak lensing analyses, both in the stacked and individual 
cases, using parametric and non-parametric mass modeling. 

Effects that modify the cluster density profile in the 
inner regions (< 0.5ft -1 Mpc), are particularly problematic 
given that the weak lensing signal AE(_R) is sensitive to the 
density profile not just at a projected separation R, but also 
at all smaller separations. We propose a modified statistic, 
denoted T(R; Ro), that removes the dependence on the pro- 
jected density between R = and R — Ro, with Ro chosen 
to avoid scales with systematic uncertainties. The decrease 
in systematic errors that results from removing scales below 
i?o comes at the expense of somewhat increased statistical 
errors. We explore the optimal choice of Ro, and quantify 
the degree to which our use of this new statistic to estimate 
cluster masses lessens systematic biases and increases statis- 
tical errors. Our tools for this investigation include simple, 
idealised cluster density profiles; more complex and realis- 
tic density profiles from A-body simulations; and finally, 
real clu ster lensing data from the Sloan Digital Sky Survey 



(SPSS. I York et al 



iMandelbaum et al 



2000) that was previously analysed by 



(2008a) 



We begin in Section [5] with a discussion of the theo- 
retical aspects of cluster-galaxy weak lensing, including a 
detailed discussion of the challenges of mass determination, 
and a summary of typical approaches to parametric and non- 
parametric mass estimation, with the introduction of a new 
statistic from which to derive parametric mass estimates. In 
Section [21 we describe the A-body simulations that we use 
to provide sample cluster density profiles. Section [4] has a 
description of the SDSS cluster lensing data we use to test 
for some of the effects that we find using the simulations. 
Results for both the theoretical profiles and the real data 
are presented in Section [5] We conclude with a discussion of 
our findings and their implications in Section [5] 



2 THEORY 

This section includes theoretical background related to 
cluster-galaxy weak lensing, modeling of cluster masses us- 
ing lensing, and the new statistic that we propose is optimal 
for cluster mass estimation. 



2.1 Standard lensing formalism 

Cluster-galaxy weak lensing provides a simple way to probe 
the connection between clusters and matter via their cross- 
correlation function £, c i.m(r), defined as 



£cl,m(0 = {5ci(x)5m(x + r)) s , 



(1) 
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where <5 c i and <5 m are overdensities of clusters and matter, 
respectively (<5 m = Pm/p m — 1). This cross-correlation can 
be related to the projected surface density 

E(R) = pj [l + £ cl , m (Vi? 2 +x 2 )] d X , (2) 

where p is the mean matter density, R is the transverse sep- 
aration and x the line-of-sight direction over which we are 
projecting. Here, we ignore the line-of-sight window func- 
tion, which is hundreds of mega-parsecs broad and not rel- 
evant at cluster scales. For this paper, we are primarily 
interested in the contribution to the cluster-matter cross- 
correlation from the cluster halo density profile p c \ itself, 
rather than from other structures, and hence 



p cl (r=Vx 2 + R 2 )<lx- 

-oo 



(3) 



The surface density is then related to the observable 
quantity for lensing, called the differential surface density, 



AE(-R) = j t (R)S c = E(< R) - T,(R), 



(4) 



where yt is the tangential shear (a weak but coherent dis- 
tortion in the shapes of background galaxies) and E c is a 
geometric factor, 



D s 



4ttG D l Dls{1 + zl) 



(5) 



Here Dl, Ds, and Dls are (physical) angular diameter 
distances to the lens, to the source, and between the lens 
and source, respectively. In the second relation in Eq. @, 
E(< R) is the average value of the surface density within 
some radius R, 



E(< R) 



2 



R' E(#) dR'. 



(6) 



The second equality of Eq. Q is true in the weak lensing 
limit, for a matter distribution that is axisymmetric along 
the line of sight (which is naturally achieved by the proce- 
dure of stacking many clusters to determine their average 
lensing signal), or in the non-axisymmetric case, provided 
that E is averaged azimuthally. For individual cluster anal- 
yses, profiles can be fit either using average shears in annuli, 
or with full, two-dimensional shear maps. 

Unless otherwise noted, all computations assume a flat 
ACDM universe with matter density relative to the critical 
density fl m = 0.25 and Aa = 0.75. Distances quoted for 
transverse lens-source separation are comoving (rather than 
physical) /i -1 Mpc, where the Hubble constant Ho = 100 h 
km s _1 Mpc -1 . Likewise, the differential surface density AE 
is computed in comoving coordinates, Eq. ([5]), and the factor 
of (1 + zl)~ 2 arises due to our use of comoving coordinates. 



2.2 Theoretical challenges in cluster mass 
modeling 

In this section, we discuss theoretical challenges in cluster 
mass modeling. By "theoretical" challenges, we refer to is- 
sues that cause the underlying cluster density profile (surface 
density E) to be unknown. This uncertainty in E at a given 
scale R is propagated to larger scales in AE(_R) because of 
its dependence on E(< R) (Eqs. (0} and ©). 



2.2.1 Unknown density profile 

When attempting to extract three-dimensional enclosed 
masses from the projected lensing data, the unknown den- 
sity profile may lead to a biased mass estimate. For example, 
even for the latest generation of simulations, the concentra- 
tion parameter (defined more precisely below) of clusters 
remains somewhat uncertain, with diffe rences at the leve l 
of 20 per cent at the high mass end (|Dolag et all 120041 : 
iNeto et al.ll2007l ; IZhao et al.l I2009T ) . The concentration pa- 
rameter at a given mass is also affected by the assumed cos- 
mological model, especially the amplitude of perturbations. 
For a given halo mass, the differences between the profiles 
increase towards the inner parts of the cluster, and if only 
those scales are used in parametric fits for mass estimation, 
this can result in a significant error on the halo mass. In 
this paper, we investigate bias due to unknown cluster con- 
centration extensively, including the use of parametric mass 
estimators with assumptions about the form of the profile, 
and the use of non-parametric projected mass estimates that 
require an assumption about the profile to get a 3d enclosed 
mass. 



2.2.2 Baryonic effects 

The effect of baryons on the cluster mass distribution 
unclear, but may be sign i ficant 



ter regions dBlumenthal et all Il986l 



Naab et al.l [200 



in the inner 


clus 


Gnedin et al. 




2004 


[Zentner et al. 




2008 



Barkana fc Loebfoogf T Baryon cooling not only brings sig- 



nificant mass into the inner regions of the cluster, but may 
also redistribute the dark matter out to much larger scales 
than the scale of baryon cooling. These works suggest that 
the effect of baryons is to change the cluster matter profile 
in the inner regions in a way that roughly mimics a change 
in the halo concentration; however, the extent of this effect 
in reality, and the affected scales, is unknown. 



2.2.3 Offsets from minimum of cluster potential 

The cluster centre about which the lensing signal should 
be computed can be determined using a variety of meth- 
ods. The most reliable approach is to use the peak in X- 
ray or Sunyaev-Zeldovich flux. For optically-identified clus- 
ters, the usual method is to find the brightest cluster galaxy 
(BCG). The offsets from the true cluster centre arise due to 
two effects: (1) BCGs may be slightly perturbed from the 
minimum of the cluster potential well by some real physi- 
cal effect, such as an infalling satellite, and (2) photomet- 
ric redshift errors and/or limitations in the cluster detection 
technique (when detecting clusters using imaging data) may 
lead to the wrong galaxy being chosen as the BCG. This lat- 
ter effect might occur, for example, with red-sequence clus- 
ter finding algorithms, in cases of BCGs with bluer colours 
(estimate d to be ~ 25 per cent of the BCG population 



tively in Johns ton et al 



in reality, Bildfell et al 



2008). As was discussed quantita- 



<|2007t ). the effect of BCG offsets on 
stacked cluster lensing data is to convolve the surface den- 
sity E(i?) with some BCG offset distribution, which tends 
to suppress the lensing signal in the inner regions (similar, 
qualitatively, to the effect of the previous two systematic is- 
sues we have discussed). Consequently, fitted cluster masses 
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and co ncentrations will be reduced due to these centroiding 
errors l|Guzik fc Seliakl 12002; lYang et al.l I2006T ). Note that 
while cluster centroiding errors arise due to observational 
limitations, we classify them as a theoretical issue because of 
their impact on E(J?) which leaks to larger scales in AE(_R). 

Studies comparing the BCG position to the cluster 
centre defined by either the X-ray intensity or by the av- 
erage satellite velocity have found that the typical dis- 
placement is about 2-3 per cen t of the virial radius when 
the BCG is properly i dentified (Ivan den Bosch et al. I l2005l ; 



iKoester et atll2007al lbl iBildfell et al.ll200Sl ). The last of these 
studies finds that for about 10 per cent of BCGs, the dis- 
placement is > 10 per cent of the virial radius. Another 
study that includes red galaxy photometric errors (i.e., both 
causes of offsets rather than just the first) finds that the 
median displac ement is 10 per cent of the virial radius 
l|Ho et al.ll2009l ). 

Because the real data we use as a test case uses the 
maxBCG lens sample, we focus in more detail on the is- 
sue of BCG offsets for that cluster catalogue. The maxBCG 
group uses mock catalogues to estimate the distribution 
of BCG offsets resul ting from the use of their algorithm 
I Johnston et al.|[2007l ). The accuracy of the distribution they 
find is quite sensitive to the details of how the simulations 
are populated with galaxies. In brief, their result includes a 
richness-dependent fraction of misidentified BCGs (from 30 
per cent at low richness to 20 per cent at high richness), and 
those that are misidentified have a Gaussian distribution of 
projected separation from the true cluster centre, with a 
scale radius of 0.42 h~ x Mpc. 

A full discussion of how these resul ts from mocks com- 
pare w ith observations can be found in iMandelbaum et al.l 
(|2008ah . To summarize, at high masses (mo re than a few 
xlO 14 ^ -1 Ma), a com parison with X-rays (|Koester et al.l 
l2007bl ; iHo et al.l [ioosh suggests that the mocks may over- 
estimate the fraction of offsets greater than 250 h~ kpc. 
However, the true level of offsets for the majority of the 
cluster catalogue is poorly constrained from the real data. 



2.3 Observational challenges in cluster mass 
modeling 

In this section, we discuss observational challenges in clus- 
ter mass modeling. We define "observational" challenges as 
those that result in difficulty in properly measuring AE(i?) 
for a given density profile E(i?). 



2.3.1 Lensing signal calibration 

The cluster-galaxy lensing signal overall calibration is an 
important issue for cluster mass estimates. The signal may 
be m iscalibrated due to s hape measurement systematics 
(e.g., iHevmans et al . 2006; Massev et al.|[2007l ; iBridle et al.l 
120091 ) , unknown lens and /or source redshift distribution s 
(e.g., iKleinheinrich et al.ll200ll ; IMandelbaum et al.ll2008tj) . 
and contamination of the "source" sample by stars. The ef- 
fect of miscalibration typically is to multiply the signal on 
all scales by a single multiplicative factor. We will investi- 
gate the effect of changes in lensing signal calibration on 
the estimated masses when fitting both parametrically and 
non-par ametrically. 



2.3.2 Signal dilution due to cluster member galaxies 

In principle, in the absence of intrinsic alignments, contam- 
ination of the source sample by cluster member galaxies 
will dilute the cluster lensing signal, since the cluster mem- 
ber galaxies are not lensed. Thus, they suppress the cluster 
lensing signal, with the strongest effect towards the cluster 
centre where the member galaxies are most numerous. For 
stacked cluster lensing data, this effect may be effectively 
removed by cross-correlating random points with the source 
catalogue, and boosting the signal by the scale-dependent 
ratio of the weighted number of sources a round the real clus- 
ters to that around the random points (|Hirata et al1l2004l ; 
ISheldon et~aT]|2004l ; IMandelbaum et al.ll2005al ). 

For measurements of individual cluster lenses, the best 
way around this problem is to use some colour-based cri- 
terion that removes the cluster member galaxies. Without 
multicolour imaging, contamination of the lensing signal 
can be several tens of per cen t on a few hundred fe -1 kpc 
scales dBroadhurst et all 120051 ; iLimousin et alj [2007). and 
even with it, there may be residual dilution of the signal of 
approximately ten per cent on those scales (Hockstra] |2007l ; 
lOkabe et ai]|2009h . This scale-dependent suppression of the 
signal results in underestimation of the cluster mass and 
concentration. 



2.3.3 Intrinsic alignments 

Intrinsic alignments of galaxy shapes with the local tidal 
field can affect cluster lensing measurements when clus- 
ter member galaxies that are treated as sources actu- 
ally have some mean alignment of their shapes radi- 
ally towards the cluster centre. This effect, which leads 
to a suppression of the lensing signal that is worse 
at smaller transverse separations, has been detected ob- 
serva t ionally in several contex t s (lAgustsson fc Brainerdl 
| 2006l; IMandelbaum et~al] l2006al; iFaltenbacher et all \2Wm , 
iHirata et al.l 120071 ; ISiverd et al.l 12003 '). Its amplitude varies 
with cluster mass, member galaxy type, and separation from 
the cluster centre. 

The best way to avoid this effect is to remove cluster 
member galaxies from the source catalogue, but a perfect re- 
moval is often not possible, as described in Section [2 .3 . 2 1 and 
references therein. When using a very large stacked sample, 
the amplitude of the effect may be roughly estimated using 
the estimated shear from the sample of galaxies that were 
chosen based on the colour-redshift relation to be cluster 
member galaxies. This test, however, is only possible with 
good colour information for the source galaxies. We defer a 
detailed discussion of the effects of intrinsic alignments on 
weak lensing cluster mass estimates to future work, but the 
sign is always to lower the signal (and therefore mass) in a 
way that is worse at smaller cluster-centric radius. 



2.3.4 Non-weak shear and magnification effects 

The measured weak lensing signal is not precisely the tan- 
gential shear y t , but rather the reduced shear g — 74/ (1 — k) , 
where k = E/E c is the convergence. For a typical cluster 
density profile, the difference between g and jt is of order 
unity at the critical radius where ft = 1 (that depends on the 
redshift, but can be as large as IOO/2. -1 kpc) reducing to a 
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few per cent out to transverse separations of ~ 500ft _1 kpc, 
beyond which the assumption that g ~ jt is quite accu- 
rate. This distinction may be e xplicitly taken into accoun t 
using parametric mass models |Mandelbaum et al.ll2006bT ). 
but is typically ignored in non-parametric mass estimation 
(though since those estimates usually do not rely on the 
shear on small scales, this neglect is not necessarily a prob- 
lem). 

A related effect is magnification, which alters the source 
galaxy population by changing the measured fluxe s and 
sizeiH (|Mandelbaum et"aT]|2005a! ; ISchmidt et alj|200gf ). As a 
result, their redshift distribution may change, and the num- 
ber density of sources near lens galaxies typically differs from 
that in the field. Furthermore, if a correction is made to the 
observed weak lensing signal for stacked clusters to account 
for the dilution due to cluster member galaxies included in 
the source sample, as suggested in Section 12.3.21 then this 
correction must be carried out by using the observed source 
number densities relative to that around random points. The 
boost factor is supposed to only correct for changes in source 
number density due to clustering (which introduces unlensed 
galaxies into the source sample). Since the number density 
of lensed galaxies may legitimately be altered by magnifica- 
tion, magnification can lead to incorrect boost factors. This 
effect may be accounted for using parametric mass model- 
ing, provided that the properties of the source sample at the 
flux and apparent size limi ts is reasonably well understood 
jMandelbaum et al.|[2006bh . 



errors based on mock catalogs (I Johnston et al.l [2007b. and 
rescale them all to mimic calibration offsets. However, we 
do not want to rely too much on our modeling of these ef- 
fects (as concentrations and a specific centroid error model) 
being correct in detail. Thus, if cluster mass determination 
is to be robust, we need estimators that are as insensitive 
to these types of changes in cluster profile as possible. Note 
that a key feature of all three types of changes in profile 
is that they affect the inner cluster regions. This fact leads 
to the requirement that the small scale information is sup- 
pressed, which will motivate a new statistic introduced in 
this paper. 

In all cases, we use spherically-symmetric profiles, as 
is appropriate for stacked cluster lensing analyses. The ob- 
served lensing profile is roughly equivalent to the spher- 
ical average of the underlying triaxial density profiles of 
the dark matter halos, so that the cluster masses can 
be recovered to few per cent a ccuracy with mass estima- 
tion assuming spheric al profiles jMandelbaum et aUT 2005b; 
ICorless fc Kind I2009T ) . For individual cluster lensing esti- 
mates, however, there is an additional level of complication 
due to the assumption of a spherical profile: individual devi- 
ations in the form of the profile from the a ssumed form due 
to mergers, substructure dKing et al 2001 ), and deviations 
from a spherical shape |Clowe et al. 2004 : ICorless fc Kind 
120071 ) can cause tens of per cent uncertainties in cluster mass 
model parameters. We do not attempt to estimate the uncer- 
tainties for individual cluster lensing analyses due to these 
effects, relying instead on previous work. 



2.4 Summary of the challenges and how we model 
them 

The challenges discussed in the previous two subsections re- 
sult in three types of changes in the lensing signal. One 
type of change is an elevation (suppression) of the lensing 
signal on small scales that changes sign at some value of 
transverse separation to become a suppression (elevation). 
For example, this change may result from an unknown dark 
matter concentration and baryonic effects. The second type 
of change is a uniform suppression or elevation of the lens- 
ing signal in the inner cluster regions, such that the lensing 
signal gradually reaches the expected value at and above 
some value of transverse separation. This change may re- 
sult from cluster centroiding errors, dilution of the lensing 
signal due to cluster member galaxies and/or intrinsic align- 
ments, non-weak shear, and magnification-induced errors in 
the source redshift distribution and number density. The ex- 
act functional forms for and magnitudes of these changes, 
and their characteristic scale radii, vary depending on the 
situation. However, we will use two models, one for each type 
of change. The final type of change in the lensing signal that 
we consider is a uniform calibration offset. 

The profiles we use for our test cases include pure NFW 
profiles, and the cluster lensing signal observed in TV-body 
simulations. We modify the concentrations of these test pro- 
files, apply a model for the effects of cluster centroiding 



4 The change in apparent size may not be important for typical 
photometric data, but weak lensing measurements require impo- 
sition of an apparent size cut on the galaxies to ensure that they 
are well-resolved relative to the point-spread function (PSF). 



2.5 Signal due to other mass 

The measured lensing signal is caused by the projected mass 
distribution around the cluster, and consequently it includes 
some contributions that are not part of the cluster halo, 
which will affect the mass estimates. In the case of stacked 
cluster lensing analyses, the average over these contributions 
from all clusters in the stack results in the so-called halo- 
halo term, which can be modelled simpl y using t he clu ster- 
matter cross-power spectru m as in, e.g., ISeliaJd (|2000l 1 and 
iMandelbaum et al.l (|2005bl ). This term becomes dominant 
on several ft -1 Mpc scales. While here we use scales where 
this term is sub-dominant, we will consider the question of 
how the estimated masses may be biased if this term is not 
explicitly modelled but is instead neglected. This failure to 
model the halo-halo term should tend to pull the mass es- 
timates upwards, since mass that is not part of the clus- 
ter mass distribution will be attributed to the cluster. Our 
approach is to simply use the cluster lensing signal from 
simulations without explicitly decomposing it into one- and 
halo-halo terms; thus, mass that is not part of the cluster 
mass distribution itself is implicitly included in our numer- 
ical predictions of the cluster lensing signal. 

For individual cluster lensing analyses, the effect of mat- 
ter that is not part of the cluster on the lensing signal is 
more complex, because unlike for stacked analyses, no aver- 
aging process occurs over the structures arou nd many clus- 
ters. As a r esult, local nonvirialized str ucture jMetzler et al.l 
19991. 120011) an d large-scale structure |Hoekstrall200ll , 120031 : 



Dodelsonl2004T ) can appear in the cluster lensing signal on all 



scales, not just large scales, causing both an average bias and 
significant scatter in the mass estimates. A recent numeri- 
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cal study of large-scal e structure projectio n effects on weak 
lensing cluster counts l|Marian et al.ll2009T l. has shown that, 
whilst there is scatter and bias in the M2d~Msd relation, the 
utility for such data to constrain cosmological parameters 
through the mass function is not impaired. Moreover, if one 
uses carefully constructed aperture mass shear filters, then 
the bias arising from 'correlated' large-scale structure can 
be reduced to the percent level (Marian et al. ApJ submit- 
ted.). However, the impact of 'chance' projections along the 
line-of-sight on the mass estimates is still relatively poorly 
quantified. While we use simulations to assess the effect of 
the halo-halo term on stacked cluster analyses that neglect 
it, a detailed treatment of this issue for individual cluster 
lensing analyses is beyond the scope of this paper. 



2.6 Parametric modeling of cluster masses 

In principle, we can model the cluster-galaxy weak lensing 
signal as a sum of two terms, the first due to the BCG stellar 
component, only important on scales below ~100ft _1 kpc, 
and the second due to the dark matter halo. Typically, the 
halo is modelled using the broken power-law NFW density 
profile (|Navarro et al.lll996T ): 



p(r) = 



(r/r s ) (1 + r/r s ) 



(7) 



where the scale radius r s is the scale at which the loga- 
rithmic slope, dlnp/dlnr, is equal to —2. While this ap- 
proac h to cluster mass estimation is fairly standard, recent 
work dMerritt et alj|2006l; iGao et al.ll2008h suggests that the 
Einasto profile l|Einastol 1 19651 ). 



/ \ (-2/a)[(r/r 3 ) a -ll 

p(r) = p s e y > ' s ' J , 



(8) 



(where a has a weak mass dependence with a value around 
0.15) may better describe the dark matter halo profiles. We 
note here that on the scales we use for modeling in this 
work, the two profiles agree to within a few per cent. Thus, 
the NFW profile is sufficient for our purposes. 

It is convenient to parametrise the NFW profile by two 
parameters, the concentration c^oot = r200b/r s and the virial 
mass M20CK,. The virial radius r200b and p s can be related to 
M2oob via consistency relations. The first is that the virial 
radius is defined such that the average density within it is 
200p: 

M 2 oob = Y r 2oob (200p) . (9) 

The second relation, used to determine p 3 from M2006 and 
C2006, is simply that the volume integral of the density profile 
out to the virial radius must equal the virial mass (though 
when computing the lensing signal, we do not truncate the 
profiles beyond r2oo&). The NFW concentration is a weakly 
decreasing function of halo mass, with a typical dependence 
as 



C200b 



M 
Mo 



-P 



(10) 



with ft ~ 1 dBullock et all l200ll ; lEke et all l200ll ; 
iNeto et al.ll2007h , making this profile a one-parameter fam- 
ily of profiles. The normalisation of Eq. {TO} depends on 
the nonlinear mass (and hence cosmology), but for the typ- 
ical range of models, one expects C2006 = 5-8 at Mo = 



10 14 h~ 1 Mp ) Som e more recent work l|Neto et alj 120071 ; 
IZhao et alj|2009f) suggests that this mass dependence lev- 
els off to a constant concentration above some high value 
of mass. The precise value remains som e what controversia l, 
with c 2 oob ~ 5 - 6 in lNeto et all (|2007h ; IZhao et al.l < |2009h . 
but some other analyses suggest a significantly higher value 
around C2006 ~ 7 — 8 at z = (J. Tinker, private communi- 
cation). In addition, if one app lies the typical c oncentration- 
mass relation assumed in, e. g. . iHoekstral (|2007T ) to very high 
mass clusters, one finds very small concentration values, e.g. 
C2006 ~ 4 at M ~ 1O 15 M0. In this paper, we will assess the 
effect of assuming the wrong concentration value in para- 
metric mass estimates, taking C2006 = 4-7 as the plausible 
range given the current level of uncertainties. 

While we demonstrate our cluster mass estimation pro- 
cedure using stacked lensing data for which a spherical 
model is appropriate, one can easily apply the same tech- 
niques using lensing data for individual clusters. In that 
case, parametric profile fitting may use a circular average 
of the shear profile, or a full shear map with the inclusion 
of a projected ellipticity and position angle among the fit 
parameters. Here, for simplicity, we assume the former. 

There are two significant practical differences between 
stacked survey data versus data for individual clusters: First, 
survey data are typically available to large transverse sep- 
arations, whereas data for individual clusters are limited 
by the field of view (FOV) of the telescope used for the 
observations. For typical cluster redshifts in cluster lensing 
analyses, 2ft _1 Mpc is a typical maximum radius to which 
the lensing signal can be measured. Second, stacked lens- 
ing data typically yields a concentration that is around the 
mean concentration of the sample used for the stacking 
|Mandelbaum et al.ll2005bh . As a result, the main uncer- 
tainty in what concentration to assume for parametric mass 
estimation comes from differences between the published 
concentration-mass relations from iV-body simulations, the 
uncertainty in cosmological parameters, and the uncertainty 
about how baryonic cooling may have changed the halo con- 
centration. In contrast, for individual cluster data the con- 
centration is likely to vary significantly from cluster to clus- 
ter due to the intrinsic lognormal concentratio n distribution 
at fix ed mass; this variation of ~ 0.15 dex (jBullock et al.l 
120011 ) is non-negligible compared to the sources of system- 
atic uncertainty about halo concentration. 

In this paper, when studying the effects of parametric 
models on fits for the mass, we choose to fix the halo con- 



centration as in some individual analyses, such as Hoekstra 



(2007) , and some stacked analyses, such as I Reyes et al 

(2008) . Other works have fit simultaneously for a con- 
centration and a mass (e.g., [Mandelba um et alj [2008a; 
lOkabe et al.ll2009T ). In the latter case, there is no concern 
about biases in the mass due to assumption of the wrong 
concentration, but small biases may remain due to devia- 
tions of the profile from NFW, and there is a loss of statis- 
tical power so that the mass estimates become noisier. Fur- 
thermore, if there are systematic errors in the data (such as 
centroiding errors or intrinsic alignments) that do not per- 
fectly mimic a change in concentration, those analyses may 
still find a biased result for the mass. For the most part, we 
wish to characterise systematic biases that can occur when 
the concentration is fixed, but we will mention the effects of 
allowing it to vary. 
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Finally, we note that parametric mass estimation lends 
itself easily to corrections for effe cts such as non-we ak shear 
and magnification bias (Mandelb aum et al.l l2006bl ). These 
effects can simply be incorporated into the model before 
comparing with the data. 

2.7 Non-parametric modeling of cluster masses 

Another common approach to cluster mass estimation is the 
non-parametric aperture mass s tatistic. In this w ork, we 
present tests of the £ c statistic (Clowe et al. 19981). which 
is related to the £ statistic of Fah lman et al.l ( 1994T ). £ c has 
bee n used in s evera l rece nt cluster modeling papers, includ- 
ing |Hoekstra| (|2007T l and lOkabe et all ((2009). This statistic 
is defined using three radii: the first, Ri, is the transverse 
separation within which we wish to estimate the projected 
mass; the second and third, R i and R 2, define an outer 
annulus. £ c is equal to the mean surface density within Ri 
relative to that in the outer annulus: 



Co(Ri) = k(R < Ri) - K(Roi <R< R02), 



(11) 



where k is the scaled surface density or convergence, « = 
E/E c . The aperture mass statistic can be measured using 
the observed shear 'yt(R) using 



CcCfti) = 2/ dlnRy t (R)+ 

J Rx 



(12) 



1 - (R01/R02) 2 j R 



dlnR-yt(R). 



The 2d (projected or cylindrical) mass M2d(Ri) within Ri 
can be estimated from f c by 



M2d{Ri)=iiRl ScCc(-Ri) 



(13) 



Typically _Ri is chosen to be either a fixed physical scale, 
or a spherical over-density radius (determined either using 
a parametric model to estimate the appropriate radius, or 
iteratively using the aperture mass estimate from the data). 
Various approaches are taken to the second term in Eq. (|12[) . 
which should ideally be sub-dominant to t he first given the 
scaling of shear with radius. For example, iHoekstral (|2007T ) 
use the parametric fits to an NFW profile with fixed con- 
centration parameter t o estimate th e amp litude of the sec- 
ond term. In contrast, lOkabe et al.1 (2009) neglect it, after 
choosing R i to be 10-15 arcmin, depending on where in the 
cluster field there appeared to be significant structures that 
they wished to avoidQ For their typical cluster redshifts, 
this choice corresponds to roughly 2-2.5 comoving Mpc 
in transverse separation. We will consider the effect of both 
approaches in our tests below. 

The aperture mass statistic is often used because of its 
insensitivity to the details of the cluster mass profile. Fur- 
thermore, because it estimates the mass within Ri using 
the shear on scales larger than Ri , it is not very sensitive to 
systematics that affect the signal in the inner parts, such as 
contamination by cluster member galaxies, intrinsic align- 
ments, and centroiding errors. This decreased sensitivity to 
systematics comes at a price, however: as shown in Eq. (|12[) . 
the determination of £ c requires integration over the mea- 
sured shear profile in logarithmic annular bins, which can 



M. Takada, private communication 



often be quite noisy. Our tests will help quantify the extent 
to which this noisiness increases the statistical error on the 
mass estimates relative to parametric modeling. 

An additional disadvantage to the use of the aperture 
mass statistic and the derived M2d is that cosmological anal- 
yses using the mass function, and any comparison against 
X-ray-derived masses, requires the use of a 3d (enclosed) 
mass, Msd- The conversion from M2A to M-^a requires the 
assumption of a profile, such as NFW (for which a concen- 
tration parameter must either be assumed, or derived from 
parametric fits). This conversion factor may be derived an- 
al ytically from expressions for the enclosed M2a and as 
in l Wright fc Brainerdl l |200CMOkabe et alj ((2009) show that 
the conversion factor only weakly depends on the concen- 
tration, but for analyses that seek to determine the mass to 
10 per cent, this dependence on concentration is still impor- 
tant. A way of avoiding this necessity would be to determine 
the mass function in terms of projected masses in the simu- 
lations, rather than the typical practise of using M3& within 
some spherical over-density; however, given that this has 
not yet been done, we also test the effect of this M2d to M^a 
conversion. 



2.8 New statistic for mass estimation 

As noted previously, one complication in parametric mod- 
eling of the lensing signal AE(i?) is the sensitivity to the 
mass profile on small scales, which is particularly prone to 
theoretical and observational uncertainty. We wish to avoid 
sensitivity to small scales, which comes from the first term 
on the right-hand side of Eq. @, via £(< R) (defined in 
Eq.©. 

Thus, we must turn the lower limit of integration in 
Eq. J6| from R — to some larger scale that is not strongly 
affected by small-scale systematics such as intrinsic align- 
ments and centroiding errors. We refer to this new minimum 
scale as Ro, and achieve our goal by defining the annular dif- 
ferential surface density (ADSD) 



T(R;Ro) = A£(.R) - AE(7? ) 



(14) 



_2_ 

R? 



E(R')R' &R' ~ E(R) + £(Rq) 



Ro 



As shown in Eq. |T4j), by subtracting off AE(i?o) (Ro/R) 2 
from the observed lensing signal, we achieve our goal of re- 
moving the sensitivity to scales below Ro. The resulting ro- 
bustness of the analysis to systematic errors comes at the 
expense of introducing slight (~ 10 per cent level) anti- 
correlations between the signal around Ro and the signal 
at larger scales, plus increased statistical errors. 

For some of this paper, we model theoretical and obser- 
vational uncertainties in AE as changes in the NFW con- 
centration parameter. However, as already discussed, some 
systematics are manifested in different ways (e.g., centroid- 
ing errors) that must be modeled rather differently. If one 
truly believes that unknown concentration is the dominant 
systematic uncertainty, then the simplest solution would be 
to fit AE to an NFW profile and then marginalize over the 
concentration. Since we do not believe that this procedure 
is adequate for all theoretical and observational systematics, 
parametric modeling of T(R;Ro) to remove all small-scale 
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information is a better solution that will give more accu- 
rate mass estimates. For some systematics, we will see in 
Section [5] that we do not, in general have to select Tio to 
be completely above the affected scales, because the errors 
in T(R; Ro) change sign and thus nearly cancel out of the 
mass estimation, despite contributing to AE(_R) with the 
same sign at all scales. 

In practise, to use the ADSD T(R;Ro) we must esti- 
mate AE(_Ro) from the data itself. In this work, we have 
tried two methods of doing so based on fits to the following 
functional form for AE in the neighbourhood of Rq: 



AE(-R) = AE( J R ) 



p+q(R/R ) 



(15) 



In the simpler method, q — 0, whereas in the more complex 
method it is a free parameter in the fit (which generally 
allows for a better fit to broken power-law profiles such as 
NFW, but also increases the statistical errors on the mass) . 
We primarily present results of the latter procedure, but 
discuss the trade-offs between the two in Section [S] 

Finally, we note that the ADSD T(R; Ro) is well-suited 
not only to estimating cluster masses, but also to cosmolog- 
ical studies, where the choice of Ro to be outside of the host 
dark matter halo virial radius allows contributions to the 
lensing signal from s mall-scale information to be suppressed 
i|Baldauf et alj|200g| ). 







h 


<78 n s w 


0.25 


0.75 


0.7 


0.8 1.0 -1 



Table 1. Cosmological parameters adopted for the simulations: 
matter density relative to the critical density, dark energy density 
parameter, dimensionless Hubble parameter, matter power spec- 
trum normalisation, primordial power spectrum slope, and dark 
energy equation of state p = wp. 



spherical shells about the halo centres of the cluster stack, 
N c i tln (ri). Our estimator for the correlation function is 



fcl,m(ri) = 



(n) 



N, 



(rand) 
cl.m 



(16) 



where Af c i™ d) (ri) = NdNxuVshell/Vbox is the expected num- 
ber of pairs for a purely random sample (for N c \ and iV m 
defined as the total number of clusters and matter particles 
in the box, respectively), and V s heu = 47r(r 3 +1 — rf)/3 is the 
volume of the spherical shell at ri. To reduce the compu- 
tational cost of this calculation, we dilute the dark matter 
density field by a factor of 24, using only 20 x 10 6 dark 
matter particles. We have confirmed the convergence of this 
procedure. 



3.2 Correction for resolution effects 



3 SIMULATIONS 

To obtain realistic cluster lensing profiles for our tests 
of mass inference methods, we use the Zurich horizon 
"zHORIZON" simulations, a suite of thirty pure dissi pation- 
less d ark matter simulations of the ACDM cosmology |Smithl 
2009). Each simulation models the dark matter density field 
in a box of length L = 1500ft" 1 Mpc, using JV P = 750 3 dark 
matter particles with a mass of M dm = 5.55 x 1O 11 /i" 1 M . 
The cosmological parameters for the simulations in Table 
[1] are inspired by the results o f the WMAP cosmic m i- 
crowave background experiment (|Spergel et al.ll200l [20071 ). 
For this work, we use eight of the thirty simulations, and 
probe a volume of 27ft _3 Gpc 3 at redshift z = 0.23. The 
initial con ditions were set up at redshift z = 50 using the 
2LPT code (|Scoccimarrdl 19981 ). The evolution of the N equal 
mass particles under gravity was then foll owed using th e 
publicly available iV-body code GADGET- 1 1 (jSpringell f2005). 
Finally, gravitationally-bound structures were identified in 
each simulation snapshot using a Friends-of-Friends (FoF, 
iDavis et aLlll985l ) algorithm with linking length of 0.2 times 
the mean inter-particle spacing. We rejected halos contain- 
ing fewer than twenty particles, and identified the potential 
minimum of the particle distribution associated with the 
halo as the halo centre. We note that using the FoF halo 
finder might cause some problems with the halo profile, since 
FoF tends to link together nearby halos. In total, we iden- 
tify halos in the mass range 1.1 x 10 13 ft -1 Mq ^ M' 2 oob ^ 

4 x 10 15 ft -1 M Q . 



3.1 Calculation of the signal 

We calculate the spherically-averaged correlation function 
in the simulations using direct counts of mass particles in 



Despite the large dynamical range of our simulations, our 
resolution is still limited on small scales. The force softening 
length was set to 70ft -1 kpc, so our results may not be reli- 
able for r < 200ft -1 kpc. This resolution problem limits our 
ability to predict the excess surface mass density AE(i?) 
on small scales, since this quantity is affected by the av- 
erage over the correlation function on even smaller scales. 
Therefore, to correct for this problem, we continue the pro- 
file toward small scales using the NFW profile as follows: 



-i . >.(stitch) 
1 + ?cl,m 



( r ) = j^ W) W/P- for r < Twitch 
Perm 3 MM for r > r sti tch 



(17) 



We used the combinations (r stitch = 0.2ft 1 Mpc, C2006 = 5) 
and (r s titch = 1.0ft -1 Mpc, c 2 oob = 7). 

Virial radii and masses are calculated by imposing the 
constraint 

3 f Tvlr , 1,! , / r. . . , M 3M2006 . , 10 v 

(18) 



(r ) dr [1 + f c i,m(r )J = - — 5 — = 1. 



r 20Qb a Jo L ' J ^rl 00bP 5 

The over-density of halos is assumed to be S — 200 times 
the background density. The profile is then spline fitted and 
integrated along the line of sight, over separations — 50 ^ 
X 50ft -1 Mpc from the cluster. 



4 DATA 



The SDSS (|York et all [2000) imaged roughly it steradi- 
ans of the sky, and followed up approxim ately one million 
of the detected objects spectroscopically (lEisenstein et al.l 
120011 : iRichards et"afl 120021 : IStrauss et ail 120021 ). The imag- 
ing was carried o ut by drift-scann i ng the sky in ph oto- 
metric conditions |Hogg et al.l l200ll : llvezic et al.ll2004l ). m 



© 0000 RAS, MNRAS 000, 000-000 



Lensing cluster masses 9 



five bands (ugriz) (|Fukugita et al.lll996l ; ISmith et ai]|2002l) 
using a specially-designed wide-field camera 1 Gunn et al.l 
1998). These imaging data were used to create the clus- 
ter and source catalogues that we use in this paper. 
All of the data were processed by completely automated 
pipelines that detect and measure photometric proper- 
ties of objects, an d astrometrically calibrate the dat a 
jLupton et all l200ll ; iPier et all 120031 ; iTucker et all I2006D . 
The SPSS was compl e ted with its seventh data r elease 
JStoughton et al I 12002|; lAbazaiian et all 12001 12004 1 2005 ; 



Finkbeiner et all 12004 lAdelman-McCarthv et al.l booe 



2001120081 ; lAbazaiian et al.ll2009r i 

In this paper, the only data that we use are the 
maxBCG cluster l ensing data previously analysed in 
iMandelbaum et alj (|2008aT ) . Because the data were de- 
scribed there in detail, here we simply give a brief summary. 

The parent sample from which our lens samples were 
derive d consists of 13 823 MaxBCG clusters jKoester et al.l 
l2007al lbh, identified by the concentration of galaxies in 
colour-position space using the well known red galaxy 
colour-redshift relation (|Gladders fc Yee]|2000t ). The sample 
is based on 7500 square degrees of imaging data in SDSS. 
There is a tight mass-richness relation that has been es - 
tablished using d ynamical information ( Becker et al.ll2007l) 
and weak lensing (Johnston et aHl2007l ; Mandelbaum et all 



l2008al ; lReves" et alj|2008r i across a broad range of halo mass. 
The redshift range of the maxBCG sample is 0.1 < z < 0.3; 
within these redshift limits, the sample is approximately 
volume-limited with a number density of 3 x 10~ 5 (/i/Mpc) 3 , 
except for a tendency towards hig her number densi ty at the 
lower end of this redshift range (|Reves et al. 1 120081 1. In this 
paper, we use scaled richness in red galaxies above 0.4L, 
within -R200, known a s N200, as a primary trace r of halo 
mass. For the data in IMandelbaum et al.l (|2008aT ) that we 
use here, the richness range is 12 ^ -/V200 ^ 79 divided into 
six bins (12 iVaoo s= 13, 14 sC iV 2 oo ^ 19, 20 sC A 20 o ^ 28, 
29 ^ A200 «S 39, 40 s= A200 s= 54, and 55 ^ iVaoo < 79). 

The source sample with estimates of galaxy shapes is 
the sam e as that originally described in IMandelbaum et al.l 
l|2005al ). This source sample has over 30 million galaxies 
from the SDSS imaging data with r-band model magnitude 
brighter than 21.8, with shape measurements obtained using 
the REGLENS pip eline, including PSF c orrection done via 
re-Gaussianization (Hirata fc Scliak 2003|) and with cuts de- 
signed to avoid various shear calibration biases. The overall 
calibration uncertainty due to a ll systematics was originall y 
estimated to be eight per cent (|Mandelbaum et al"1l2005ah . 
though the redshift calibration component of this system- 
atic error budget has recently been de creased due to the 
availab ility of more spectroscopic data (|Mandelbaum et al.l 
2008b). The absolute mass calibration is not a critical issue 
for this paper, in which we study the changes in estimated 
mass for a given observed signal when using different esti- 
mation procedures. 



5 RESULTS 

5.1 Purely analytical profiles 

In this subsection, we add realistic levels of noise to pure 
NFW profiles to create simplified mock cluster density pro- 
files. The profiles that we use have \og 10 [hM2oob /Mq] = 14.0 



and 14.8, with C2006 = 4 and C2006 = 7 (see properties of 
these profiles listed in Table [2]). Using these profiles, we can 
test the dependence of parametric and non-parametric mod- 
eling on assumptions about the NFW concentration param- 
eter. We caution that these profiles cannot be used to test 
for the effects of deviations from an NFW profile on the 
extracted masses when fitting assuming NFW profiles, or 
for the effects of large-scale structure contributions to the 
lensing signal. These are discussed in the next subsection. 

These values of concentration were selected as the ex- 
tremes of the variation allowed with cosmology, and with 
the various determinations of the concentration-mass rela- 
tion in the literature, including recent results suggesting that 
the conce ntration stops decreasing with mas s at the high- 
mass end (|Neto et al.ll2007l ; Izhao et al.ll2009T ). In addition, 
we consider that baryonic effects may increase the concen- 
tra tion of the dark m atter profile (for an extreme example, 
sec iRudd et al.l l2008h . Furthermore, for individual cluster 
lensing analyses, we must consider the fact that dark mat- 
ter halos exhibit a large scatter in concentration (0.15 dex, 
iBullock et al.l l2001). so the variation we have used is not as 
extreme in this case, as it may be for a stacked cluster anal- 
ysis. The change in concentration from 4 to 7 is less than 2a 
of this intrinsic scatter. 

To generate the profiles, we begin with the cluster halo 
density profile p c i(r), which is defined in very narrow loga- 
rithmic (3d) radial bins. We then numerically integrate this 
profile along the line-of-sight, for comoving line-of-sight sep- 
arations Ixl^S ?"200!> 5 to define E(i?) in very narrow logarith- 
mic bins in transverse separation R. We calculate E(< R) by 
converting the integral in Eq. © to a summation. AE(i?) 
can then be computed directly from E(< R) ~ E. 

To make this theoretical signal, defined in very narrow 
bins without any noise, look like an observed signal, we then 
do the following. First, we use a spline to determine the val- 
ues of AE at the center of the bins in R used to calculate 
the rea l signal for maxBCG clusters in IMandelbaum et al.l 
(2008a|). Second, we choose a cluster richness subsample 
from that paper with roughly comparable mass to the theo- 
retical signal we are using. We estimate a power-law function 
for the (bootstrap-determined) errors as a function of radius 
from our selected cluster subsample, to avoid the influence 
of any noise in the determination of the covariances. We use 
this power-law to assign a variance to the theoretical sig- 
nal as a function of transverse separation. Finally, since the 
signal in the different radial bins were found to be nearly 
uncorrelated for all scales used in that paper, we add noise 
to our theoretical signals using a Gaussian distribution with 
a diagonal covariance matrix. This procedure was performed 
1000 times to generate 1000 realizations of the lensing data. 
For context, the input level of noise is typically sufficient to 
achieve ~ 20 per cent statistical uncertainty on the best-fit 
masses at the la level, when using AE with R < 4/i _1 Mpc 
to fit for the mass. 

The input lensing signals AE(7?) and T(R;Ro) (before 
the addition of noise) with several Ro values are shown in 
Fig. [T]for the higher mass value, log lo [/iM2oot/A^0] = 14.8. 
Since we will also test the effect of centroiding errors, 
which were discuss ed in detail in S e ction 12.21 we apply the 
offset model from Ijohnston et al.l l|2007l ). For offset frac- 
tions, we have chosen 20 per cent for this mass scale; for 
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Table 2. Properties of cluster lensing profiles, both analytical (pure NFW) and those from TV-body simulations. We show the mean 
number density of the sample for the mass-selected samples from TV-body simulations; the virial mass and radius M2006 and r2oot (exact 
value for the pure NFW profiles, and the ensemble mean for the samples from TV-body simulations); the analytical profiles used for 
resolution corrections of the TV-body simulations; and the best-fitting NFW profiles when fitting the simulation lensing signals AS(R) 
for scales 0.2 sC R sC 2ft.- 1 Mpc. 
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Figure 1. Top panel: from top to bottom, we show AS(i?) and T(R; Ro) with Ro = 0.25, 0.5, and 1ft -1 Mpc. The solid lines are for 
C2006 = 4 and the dashed lines are for C2006 = 7; in both cases, log 10 [ftM2oo6/^©] = 14.8. Middle panel: without inclusion of centroid 
offsets, we show the ratio of these four quantities for C2oof> = 4 versus C200!) = 7, where the line types indicate which quantity is used to 
construct the ratio, and the horizontal dotted line indicates a ratio of 1. Bottom panel: assuming C2006 = 7, we show the ratio of these 
four quantities when including centroiding offsets versus not, with the same line styles as in the middle panel. 
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log 10 [/iM 2 oo()/A^©] = 14.0, we will use 35 per cent (roughly 
in accordance with the trends with richness in that paper). 

As expected, AE(i?) for c 2 oob = 7 is higher than that 
for C2006 = 4 on small scales; the radius at which they cross 
over is relatively large because AE(i?) includes information 
from E(i?) for small R. For the 3d p{r), the cross-over ra- 
dius is within the virial radius by necessity, since the masses 
are the same. As we increase Ro in T(R;Rq), the trend 
going from C2ooi> = 4 to C200& = 7 gets less pronounced, 
because even though AE(_R) is larger on small scales for 
C2006 = 7, that also means the value that is subtracted off 
to obtain T(R;Ro) is larger. Thus, by the time we reach 
Ro = lh" 1 Mpc, T(R;Ro) is actually higher for C2oofc = 4 
than for C2ooi> = 7 for all R > Ro. 

As shown in the bottom panel, the effect of centroid- 
ing errors is quite pronounced on AE(_R). The characteristic 
scale of the offsets is 0.42/i _1 Mpc, and the signal is notice- 
ably suppressed out to three times this scale. The use of 
T(R;Ro) ameliorates this effect, and it even gets reversed 
for larger Ro, similar to what happens with the different con- 
centration values. While for AE(-R), the offsets cause sup- 
pression of the signal for all affected scales, for T(R;Ro), 
the signal is suppressed on smaller scales and elevated on 
larger scales, which suggests that biases in parametric mass 
modeling due to these offsets may be smaller because the 
small and large scale changes in sign may cancel out. 



5. 1.1 Parametric modeling 

In this section, we begin by fitting the pure NFW lensing 
signals for log lo [/iM 2 oo(,/M ] = 14.8 to pure NFW profiles. 
This procedure allows us to assess the systematic uncer- 
tainty due to the assumption of a fixed concentration when 
using various parametric fit procedures. For each noise re- 
alisation, we attempted to determine a mass using several 
fitting procedures: 

• Assuming an NFW profile with C200& = 4 and c 2 oofc = 7. 

• Using AE(i?) with minimum fit radii (i? m i n ) values 
ranging from 0.1 to 2 h~ x Mpc, maximum fit radii of 7? ma x = 
1, 2, and 4 h' 1 Mpc. 

• Using T(R;Ro) with Ro = 0.25, 0.5, and 1 h' 1 Mpc, 
again with a variety of i? m i n values (always with i? m i n > Ro). 
The value of AE(i?o) was determined on each noisy realisa- 
tion rather than from the well-determined mean over those 
scenarios, consistent with a real measurement for which 
we only have one observation of the lensing signal for a 
given sample. The estimation was done by fitting the data 
to the three-parameter functional form in Eq. (|15[) from 
0.1 < R < 0.5, 0.3 < R < 1, and 0.7 < R < 1.3fo -1 Mpc for 
Ro = 0.25, 0.5, and 1 h" 1 Mpc, respectively. 

In detail, the fits to AE(.R) are performed via \ 2 min- 
imization in comparison with theoretical signals that were 
generated via the procedure described at the start of Sec- 
tion \5A\ Thus, for each of the lensing signal realizations j, 
denoted AE^ data (-Ri) (for bins in transverse separation with 
index i such that i? m i n ^ Ri ^ -Rmax) with noise variance 
u 2 (AE,(Ri)), we use the Levenberg-Marquardt al gorithm 
jLevenberdll944l ; lMarquardt|[l963l ; IPress et al.lll992T l to find 

© 0000 RAS, MNRAS 000, 000-000 



the NFW profile mass that minimizes 
a _ v [ASf ata) (RQ - AS( m ° do "(fl i |M2o 0i ,, C200b )] 2 

(19) 

at fixed C2oob- 

The fits to T(R;Ro) require an additional step: the 
conversion of both the theoretical signals (AE (modcl) , de- 
fined without noise in very narrow bins in R) and the mock 
data (AE^ data ' , defined in realistically broad bins with added 
noise) from AE(_R) to T(R;Ro). In practice, the theoretical 
signal is defined such that we can very accurately interpolate 
to determine the value of AE(J?o), which is then used to con- 
struct T(R; Ro) directly using Eq. (|14[) . For the noisy mock 
data, we must use a different procedure. We fit to AE(i?) 
to estimate AE(f?o) using Eq. (|15[) . so that T(R;Ro) can 
be constructed. We will shortly discuss more details of this 
procedure, because we find that the exact way of getting 
AE(-Ro) is important: some methods introduce a bias on 
the mass, others add extra noise, neither of which is desir- 
able. Once T(R; Ro) is determined for the mock signals, we 
then determine its covariance matrix using the distribution 
of values for all datasets. Finally, we minimize the \ 2 func- 
tion for each mock realization using Eq. (|19[) with T(R; Ro) 
in place of AE(_R). 

We then examined the distribution of best-fit masses 
for the 1000 noise realisations to find the mass at the 16th, 
50th (median), and 84th percentile. We define the spread 
in the masses, <jm, as being half the difference between the 
84th and 16th percentile (which would be the standard de- 
viation for a Gaussian distribution). The mass distributions 
are sufficiently close to Gaussian that using the mean rather 
than the median, and using the standard deviation directly, 
would not change the plots substantially. The median best- 
fitting mass M 2 oob,est relative to the input mass M 2 oob,truc, 
and the spread in the best-fitting masses, are shown for both 
input profiles and each fit method as a function of R m i n in 
Fig. [2] The criterion that we apply when selecting a robust 
mass estimator is that the ratio M2006, est /M 2 oob, true should 
not depend strongly on the input or output C2006 (though a 
systematic offset independent of input and output C2006 is 
acceptable, since simulations can be used to correct for it). 

We begin by considering the trends in the ratio 
M2006, est /M 2 oob, true with fitting method. When assuming 
C2006 = 4 while fitting to the profile with true C2006 = 7, 
as shown in Fig. [2] the fits to AE in the upper right panel 
with J? m ax = 4/i _1 Mpc give ~ 25 per cent overestimation 
of the mass for -R m m *S 0.5fe _1 Mpc, improving to 3 per cent 
with Rmi n — 2h~ l Mpc (with, however, a doubling of the 
statistical error). The mass is overestimated in this case be- 
cause for the majority of the radial range used for the fitting, 
the lensing signal for C2ooi> = 4 for this mass is below that for 
C2006 = 7 (Fig. [TJ , so the fitting routine compensates for the 
discrepancy by returning a higher mass. This trend of over- 
estimated masses is decreased and eventually even reversed 
in sign for T(_R; Ro) as we increase Ro, for reasons that are 
clear from FigfT] The reverse situation, with input C200& = 4 
and assumed C2006 = 7, leads to biases M 20 ob, est /M 2 oob, true 
that are the inverse of the biases shown in Fig. [21 so we 
do not show this case in the figures. As shown, when us- 
ing T(R; Ro) with R m i n = Ro, the statistical error increases 
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Figure 2. Results of parametric mass fits on noisy realisations of pure NFW profiles, with input log 1( ) [WW^oob/-^©] = 14.8 and 
c 200b = 7, but C2006 = 4 assumed in the fits. The top and bottom rows show the ratio M2oob, est /^200b true an d the statistical error 
cr(M2oob cst)/-^2006 true: respectively. The latter is shown normalised to the minimum value of cr(M2oob est)/-^2006 true ~ 0.2, which is 
obtained for the fit using the nicLxinruni information, ASf/?) with — 0.1 and -Rmax 

= 4/i _1 Mpc. The results are shown for various 
fitting methods (indicated with various line and point types shown on the plot) as a function of the minimum fit radius -R m j n . From left 
to right, the panels show increasing i? ma x values of 1, 2, and 4h _1 Mpc. On the upper rightmost panel, the thin (blue) lines and points 
show the corresponding results for the log 10 [7iM2oof>/A^©] = 14.0 profile. 



over the minimum possible value from the AS(ii) fits by 
factors of 1.14, 1.32, and 2.25 when using R = 0.25, 0.5, 
and l/i -1 Mpc, respectively. 

When fitting T(R;Ro) for all Ro and i? m in, if we use a 
power law to fit for AE(7?o) (i.e., q = in Eq. (|15[0 . then 
A^2006,ost is consistently ~ 3-5 per cent above M2oof>,tru e even 
if the correct concentration is assumed in the fit. This overes- 
timation of the mass occurs because the data are not consis- 
tent with a power-law. Due to the trend of the signal with 
radius, the power-law fit tends to underestimate AE(i?o), 
thus overestimating T(R;Ro) and therefore M2oot>,cst- How- 
ever, we find that a full three-parameter fit significantly in- 
creases the noise, so we instead use a two-step procedure: 
we first fit with fixed q = in Eq. (|15[) to get a mass, then 
we use the best-fitting signal to estimate q at Ro, and use 
that fixed q value for a second two-parameter fit for A£(i?o) 
which is used for a second fit to T(R;Ro) to get the mass. 



For the remainder of this work, we present results using that 
fitting procedure in order to best estimate the mass without 
increasing the noise too much. 

Our criterion for a robust mass estimator on stacked 
cluster lensing data is that it should have systematic error 
that is relatively independent of the input C200& or the as- 
sumed C2006 for the fit, at least when compared to the size 
of the statistical error. However, this robustness should not 
be achieved at the expense of too large an increase in the 
statistical error. As shown, the fits to AE(i?) do not satisfy 
our robustness criterion, because assuming the wrong con- 
centration can lead to a systematic error that is tens of per 
cent for reasonable R m i a . T(R; Ro) with R = 0.25/1" 1 Mpc 
improves somewhat on A£(i?) in this regard, and for i? m in = 
lh -1 Mpc achieves a good combination of low systematic er- 
ror and only a small increase in statistical error. T(R;Ro) 
with Ro = 0.5/i -1 Mpc satisfies our criterion for robustness 
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when using i? m in = Ro while increasing the error by about 
20 per cent. A value of Ro = l/i -1 Mpc erases too much 
information and doubles the statistical errors. For individ- 
ual cluster lensing, the criterion for a robust mass estimator 
may differ, since if one adds many more clusters then the 
statistical error may further decrease below the systematic 
error, so an even smaller systematic error is required. 

The systematic errors shown here may be overly pes- 
simistic for stacked data, given the wide variation in concen- 
tration that was allowed relative to what is seen in iV-body 
simulations. However, several other systematics discussed in 
Sections 12.21 and 12.31 can mimic a change in concentration, 
such as baryonic effects. Thus, it is only reasonable that 
we should consider a broader range of concentrations than 
in the iV-body simulations. When considering a narrower 
range, such as 4 < c 2 oob < 5, the biases in the masses when 
fitting to AE with a fixed concentration are typically of or- 
der 10 per cent, or < 2 per cent when fitting to T(_R; Ro). 
For individual cluster lensing data, given the large lognormal 
scatter in concentration seen in simulations, these system- 
atic errors we quote are not overly pessimistic. Furthermore, 
at this level of signal to noise, the fit \ 2 values are per- 
fectly acceptable even for the wrong value of concentration, 
so goodness-of-fit cannot be used to tell whether there is a 
systematic error. 

In the upper right panel of Fig. [2] there are thin (blue) 
lines corresponding to a lower mass model that can be used 
to assess the mass-dependence of these systematic biases. As 
shown, the mass overestimation when fitting to AE(.R) is not 
as severe for the lower mass cluster as for the higher mass 
cluster at fixed -Rmin (because the strongly concentration- 
dependent part of the inner profile has moved to smaller 
radii). The virial radius for this mass is about 1.85 smaller 
than for the higher mass model, suggesting that the choice 
of Ro should be mass dependent, with the optimal value 
of 15-25 per cent of the virial radius. In practise, this re- 
lation between the virial radius and .Ro could be achieved 
iteratively by choosing some default value of Ro , fitting with 
that value of Ro, and then using the resulting best-fit mass 
to choose a more appropriate value of Ro via 

^H0.25^M P c)( B5 f22L_) 1/3 . (20) 

Here we have assumed fi m = 0.25 and a spherical overden- 
sity of 200p, and use comoving coordinations. 

We also note that the fitted masses are weakly 
cosmology-dependent. For a fixed density profile, the mass 
that we estimate depends on the assumed Q m , with M2006 oc 
r2~ ' 25 (we confirmed this scaling for the limited range of 
0.2 ^ Q m ^ 0.3). The f2 m dependence has two sources: first, 
we rescale the transverse separation and signal amplitude 
to account for the fi m dependence of the distance measures 
used to convert 9 and 74 to R and AE, and second (and more 
significantly), the halo mass definition changes since we use 
a spherical over-density of 200p. Thus, for higher Q m , the 
over-density we use is larger, which reduces the mass and 
virial radius, also decreasing the concentration C2006 since 
the scale radius is held fixed. 

While stacked cluster lensing analyses from large sur- 
veys can provide cluster lensing data to tens of hT 1 Mpc, 
individual cluster lensing analyses that are not survey-based 
typically have a limit of i? max = l-2/i -1 Mpc depending on 



the cluster redshift and telescope field of view. Consequently, 
we also explore the dependence of our results on the maxi- 
mum scale used for the fits. Based on Fig. [T] we expect that 
the biases will be even higher in this case, since when re- 
stricting to smaller scales the differences between the lensing 
profiles AE(R) are more pronounced for the different values 

Of C2006- 

The results of this test are shown only for the 
log lo [/iM 2 oo6/A/ ] = 14.8 and c 2 ooi> = 7 profile, with as- 
sumed C2006 = 4, in the different columns of Fig. [2] As 
expected, when we decrease -R max (moving right to left 
across the figure), the systematic errors increase fairly dras- 
tically. For 7? m ax = Ih" 1 Mpc, the best we can achieve 
for the fitting methods tested here is with T(R;Ro) with 
i?o = 0.5/i." 1 Mpc, and even that method has a 25 per 
cent systematic error. For R max = 2h~ Mpc, T(R; Ro) with 
-Ro = 0.5/i _1 Mpc gives several per cent systematic errors for 
both -R m in = 0.5 and lh -1 Mpc. It is clear that the existence 
of data to -R max = AhT 1 Mpc (« 2r2oo&) is very helpful in 
decreasing the systematic and statistical errors. 

These results suggest that the choice of mass estima- 
tor may depend on the maximum scale to which the lensing 
data can be measured for a given dataset. If l/i -1 Mpc is the 
maximum scale for which data is available, then truly robust 
parametric measures of mass may be difficult to find; in the 
next section, we explore whether non-parametric measures 
may be better than parametric ones in this case. For larger 
values of Rmax, T (R; Ro) with Ro = 0.5h~ Mpc seems ad- 
equate from the perspective of minimising the combination 
of systematic and statistical error. 

We next consider the effect of cluster centroiding errors, 
which were discussed in detail in Section \2. 21 Note that our 
results here are more general than that particular systematic 
error, since several observational systematics in Section \2. 31 
have a similar form. We use the signals with C2006 = 4 and 7 
for both log 10 [ftJl faob/Mjj = 14.0 and 14.8, and apply the 
offset model from I Johnston et al.l (|2007T ) as described in the 
beginning of Section 15.11 It is important to note that this 
is only one example of how photometric errors in imaging 
data can cause centroiding errors for the cluster catalogue. 

In Fig. [3] we show the results of the NFW mass fits 
to the profiles, with this offset distribution imposed on the 
data but ignored in the fit. Because Fig. [2] suggested that 
using T(i?;i?o) with Ro = l/i -1 Mpc degrades the S/N un- 
acceptably, we have only shown results for fits to AE(-R) and 
for T(R;R ) with R = 0.25 and 0.5/iT 1 Mpc. As shown, 
for the higher mass model, for the input C2006 = 4 models, 
even when the correct 02006 is assumed in the fit to AE, 
the best-fitting masses are reduced by 5-25 per cent (lower 
mass) and by up to 7 per cent (higher mass) depending on 
-Rmin- For the higher mass model, we find that T(R;Ro) 
with Ro = -Rmin = 0.5/1" 1 Mpc gives fairly consistent re- 
sults regardless of the input and assumed concentration. For 
the lower mass model, T(R;Ro) with -Ro = -Rmin = 0.25 

Mpc gives the most consistent results regardless of as- 
sumed Rmin- Moving to the right column of this figure, 
for input C2006 = 7, we see that even with the correct as- 
sumed C200i), fitting with AE(J?) can lead to underestimated 
masses by up to 30 per cent (lower mass) or 10 per cent 
(higher mass) depending on -Rmin- As for the input c 2 oob = 4 
model, we find that the fitting technique and minimum scale 
that is most independent of assumed C200& is T(R; Ro) with 
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Figure 3. Here, we show the ratio A/2006, est /M'MQb true f° r the pure C2006 = 4 (left column) and C2006 = 7 (right column) NFW models 
with log 1 g[/i.M2oot>/A40] = 14.0 (bottom row) and 14.8 (top row) after including effects of centroiding errors in the mock data. Results 



for the various fitting methods are shown as a function of the minimum fit radius R m i n , for fixed R n 



■■ Ah' 1 Mpc. The different point 



styles and colours as indicated on the plot show what type of fitting was done (AE(iJ) or T(R; Ro) with various Ro values); the different 
line types (solid versus dashed) indicate which value of C2ooi> was assumed. The dotted horizontal lines indicate a ratio of 1, the ideal 
unbiased case. 



-Ro = flmin = 0.5/i _1 Mpc and 0.25ft" 1 Mpc for higher and 
lower mass scales, respectively. The ability of T(_R; Ro) to ro- 
bustly estimate masses even with these centroiding errors is 
a consequence of what we have noted in the bottom panel of 
Fig- HI that the centroiding errors lead to biases in T(R;Rq) 
that change sign at some intermediate scale, so their effects 
approximately cancel out. 



One important point raised by Fig. [3] is that the mass 
estimates using C2oob = 4 (assumed) are less affected by cen- 
troid offsets. This finding results from the fact that with 
a low concentration, the model already includes a relatively 
low level of mass in the inner cluster regions, and therefore is 
less affected than a higher concentration halo. Thus, it may 
be advantageous to assume a concentration at the low end 
of the expected range when fittings to T(R;Ro) in scenar- 
ios involving possibly substantial offsets of the chosen BCG 
from the true cluster center. 



5.1.2 Non-parametric modeling 

In this section, we use the same noisy realisations of theoreti- 
cal cluster profiles as in the previous section, but we estimate 
masses using the aperture mass statistic f c . In this case, we 
begin with the NFW profile with log 10 [/iM 2 oof,/M ] = 14.8 
and C2006 = 7. We try various options for the different as- 
pects of this analysis: 

• Varying R\ (the radius below which we are trying to es- 
timate the enclosed mass, using the shear above that radius) 
between three values: 0.275, 0.5, and 1.1 Mpc 

• Varying R ± between three values: 1.1 and 2.Qh~ 1 Mpc. 

• Varying R 2 between two values: 2 and 4ft" 1 Mpc 
(maintaining at all times the strict hierarchy _Ri < R ol < 

Ro2). 

• Neglecti n g th e second term in Eq. (|12|l as in 
lOkabe et al.l ([2009), and estimating it using the best- 
fit NFW profil e with some assumed concentration, as in 
iHoekstral j2007h . We do not test the case in which the inte- 
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gral from R i to R 2 may be done analytically, because often 
for individual cluster lensing studies, this is not even possi- 
ble since R 2 is outside the field of view. With survey data 
or mosaic telescope data, the signal may indeed be measured 
to R 2, but it is typically quite noisy on those large scales, 
so this procedure would introduce even more noise into the 
estimated masses. 

• Assuming C2oofc = 4 and 7 whenever a profile assump- 
tion is necessary: for the estimate of the second term in 
Eq. (|12[) . and for the conversion from M2d{< Ri) to the 3d 
M2006 • 

The procedure is as follows. We use the (noisy) realiza- 
tions of the lensing signal for pure NFW profiles in logarith- 
mic annular bins to estimate £ c using a given set of radii 
(Ri , Roi , R02) ■ Thus, we use the signal for Ri < R < R i 
to calculate the first term in Eq. ()12|) via direct summation 
over the noisy mock data in broad logarithmic bins in 7?. We 
also estimate the second term using the fits to AE(il) for 
Ri < R < Ah" 1 Mpc for the assumed value of C2oof>- To do 
so, we use the lensing profile for the best-fitting M2006, de- 
termined to high precision as in the start of Section[5TT] and 
estimate the second term using direct summation over the 
numerically-determined (non-noisy) profile in narrow loga- 
rithmic bins in 7?. Given f c estimated with and without the 
second term, we then use our assumed C200& to convert the 
M2d(< ili) to a 3d virial radius Af2ooi>, which (at fixed C200f>) 
is a simple one-to-one mapping that can be determined via 
numerical integration. 

In Table [3] we present the following, first without the 
correction term for the outer annulus and then with it: 
the accuracy in recovering M 2 d(< Ri), the accuracy in 
recovering M2oob, and the statistical error on the recov- 
ered M2006 relative to that from the fit to AS(ii) using 
Ri < R < Ah" 1 Mpc. These results are shown for both as- 
sumed concentration values, C2oob = 4 and 7, given the true 
profile with log 10 [/iM2ooi>/Aio] = 14.8 and C2oob = 7. 

There are a few conclusions that can be drawn from this 
table. First, we begin with the idealized case in the top sec- 
tion of the table, where the assumed C2oot> is the same as the 
true one. In this case, we see that depending on the config- 
uration of the three radii used to estimate £ c , the projected 
mass may be underestimated by 5-40 per cent if the second 
term in Eq. (|12|) is ignored. This underestimate is prop- 
agated into an underestimate of the 3d M200& that ranges 
from 10-45 per cent. This underestimate due to ignoring the 
mass in the outer annulus is less important for il i ili 
as it is for cases where the two radii are relatively close to 
each other. We also see that the statistical error on the in- 
ferred M2oob from the aperture mass is typically comparable 
to that for the fits to AE using Ri < R < Ah" 1 Mpc. 

In this ideal case with the correct assumed C2oof>, cor- 
recting for the second term in £ c using the best-fitting profile 
to AE(i?) for Ri < R < Ah" 1 Mpc leads to unbiased recov- 
ery of both M2d(< Ri) and M2oof>; however, the statistical 
errors on M2006 are larger than when fitting to AE (il) by 
typically tens of per cent. This higher level of noise is due 
to the noisy profile used to estimate the second term in f c . 

Next, we consider the lower half of the table, in which 
we use a profile with C200& = 7, and assume C200& = 4. First, 
when we do not include the second term in Eq. (|12[) . Second, 
when we include the second term in Eq. (|12|) . the projected 



masses are all slightly overestimated (by several per cent), 
and the 3d M2oob are overestimated by 20-80 per cent (de- 
pending on ili, with smaller ill leading to larger biases). 
We can explain the slight overestimation of the M2d when 
including the second term in f c by the fact that we do the 
correction using profiles with a low C2ooi>, which give too 
much mass in the outer regions from which the second term 
is derived. The significant overestimation of M200& arises be- 
cause, when we assume too low a concentration, then we 
anticipate a profile with a low amount of mass on small 
scales, so the conversion factor from M2d(< Ri) to M200& is 
a large number. This effect will be worse for small ili, since 
the difference between the lensing profiles for different con- 
centrations is most significant there. If we allow a smaller 
variation, such as true C2oot> = 5 and assumed C200& = 4, 
then we find a 10-20 per cent effect on the 3d virial masses. 

In general, the results for an input profile with C2ooi> = 4 
can be understood as the inverse of the results given in Ta- 
ble [3] However, for a less concentrated profile, the bias in 
M2d due to neglect of mass in the outer annulus is more 
significant. For a lower mass halo and fixed transverse sep- 
aration, the mass in the outer annulus is less important. 

We next consider the effect of centroiding errors on the 
aperture mass. When using the two mass models, we find 
that the projected masses M2d are systematically suppressed 
by 10-14 per cent due to centroiding errors. The exact level 
of suppression depends slightly but not very strongly on the 
value of ili in the range we have considered, and this sup- 
pression is then propagated into a suppression of Maoob- 

Because of the definition of f c , biases in the lensing 
signal calibration that can be expressed as a single scale- 
dependent factor enter linearly into the estimated masses in 
projection, M2d oc AE. However, when using some model 
for the spherical density profile to estimate the mass within 
some radius defined in terms of a spherical over-density, such 
as M2006, the mass will scale even more strongly with AE, 
because as the signal increases, the spherical over-density ra- 
dius moves outward, thus including more mass in the total. 
The exact scaling of the enclosed mass within some spher- 
ical over-density depends on the model used to define the 
appropriate radius, and on which over-density is used, but 
typically the inferred M2006 oc AE 1,5 . 

One important point regarding the bias given in Ta- 
ble [3] due to the wrong assumed concentration (for convert- 
ing Mzd{< ill) to M2oofc) is that it has the same sign as the 
bias due to assumption of the wrong concentration when fit- 
ting to AE (R). Consequently, consistency of the JW200& from 
the aperture mass calculation and the NFW fits to AE (R) 
does not tell us whether the assumed concentration is cor- 
rect. 

In summary, we have found that the aperture mass 
statistic £ c has a strong dependence on the assumed C200i> 
when converting the extracted projected masses to 3d Ai2oob. 
An additional problem is that a (much less concentration- 
dependent) correction must be used to properly correct for 
the term from the outer annulus il i < R < R 2\ otherwise, 
the projected masses can be underestimated by tens of per 
cent, an effect that is worse for more massive clusters. While 
less affected by centroiding errors than fits to AE(il) that 
use scales below 0.5/i _1 Mpc, the aperture mass statistic can 
still be suppressed by roughly ten per cent due to centroid- 
ing errors (or any of the other errors from Section [2. 31 that 
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Table 3. Results of tests of NFW mass recovery for log 10 [/1M2006/M©] = 14-8 and C2006 = 7 when using the aperture mass statistic f c . 



Ri 

h" 1 Mpc 


Ro\ 

h^ 1 Mpc 


Ro2 
h" x Mpc 


M 2d /M 2d 


true A-f200b/-^2006,true 

Neglect second term 


a Kc), (At) 


*^2d/A^2d,truc -^2006 /-^2006,true 

Estimate second term 


^Cc), (fit) 










Assume 


c 200b = 7 








0.275 
0.275 
0.275 
0.5 
0.5 
0.5 
1.1 


1.1 
1.1 
2 

1.1 
1.1 

2 
2 


2 
i 
■1 
2 

4 
4 
4 


0.83 
0.83 
0.95 
0.66 
0.66 
0.90 
0.62 


0.74 
0.74 
0.90 
0.58 
0.58 
0.86 
0.55 


1.18 
1.18 
1.43 
0.87 
0.87 
1.12 
0.71 


1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 


1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 


1.40 
1.46 
1.54 
1.20 
1.24 
1.17 
1.01 










Assume 


C2006 = 4 








0.275 
0.275 
0.275 
0.5 
0.5 
0.5 
1.1 


1.1 
1.1 
2 

1.1 
1.1 

2 
2 


2 
4 
4 
2 
4 
4 
4 


0.83 
0.83 
0.95 
0.66 
0.66 
0.90 
0.62 


1.26 
1.26 
1.62 
0.74 
0.74 
1.21 
0.72 


1.60 
1.60 
2.01 
0.95 
0.95 
1.15 
0.68 


1.01 
1.02 
1.02 
1.01 
1.02 
1.02 
1.04 


1.75 
1.80 
1.81 
1.35 
1.48 
1.50 
1.19 


1.95 
2.00 
2.24 
1.37 
1.44 
1.40 
1.08 



have a similar form). Finally, it can be substantially noisier, 
typically by 50 per cent, than fits to AE(J?) using the same 
scales (which means that it is noisier than fits to T(R; Ro)). 

In principle, these biases due to the concentration- 
dependence of the 2d to 3d converstion may be removed 
if the conversion from M2d(< Ri) to A/2oob is carried out 
using the best-fit ting NFW profile f rom fits to both c 2 ooi> 
and M 2 oob, as in lOkabe et al.l (|2009l ). However, as will be 
shown in the next section, these fits tend to be substan- 
tially noisier due to the additional fit parameter, which will 
further amplify the noise on the recovered mass from £ c . 
Thus, this approach is not very advantageous relative to the 
fits to T(R;Ro), which are similarly insensitive to the as- 
sumed concentration but are only slightly noisier than fits 
to AE(ii). 



5.2 Profiles from A^-body simulations 

In this section, we present the results of tests of mass esti- 
mation using cluster profiles measured from the simulations 
described in Section [3] The properties of these simulated 
cluster samples are summarized in Table(2] We use the signal 
from simulations for mass threshold samples selected by tak- 
ing all clusters above some M2oof> such that n = 0.25 2, and 
16 x 10 _6 (/i/Mpc) 3 , with the first of these samples shown in 
Fig- HI The samples have mean masses (M 2 oob) = 7.36, 3.95, 
and 1.55 x 10 14 /i _1 M o , though the stitching to NFW pro- 
files below certain scales as described in Section [3] increases 
the mass by several per cent. All comparisons between esti- 
mated M2oo6,est and true M2ooi,,true take this small increase 
into account. The error-bars shown in Fig. [4] which include 
cosmic variance, are estimated by dividing the eight simu- 
lation boxes each into 20 sub-volumes comparable in size to 
that of the maxBCG cluster sample, and finding the vari- 
ance of the signal between the 160 total sub-volumes. We 
have only shown the case of stitching to NFW profiles with 
C2006 = 5 at 0.2ft" 1 Mpc in Fig.g] when stitching to an NFW 




R [h->Mpc] 

Figure 4. Top: Lensing signal RA'E(R) from simulations for the 
higher mass (lower number density) threshold sample described in 
the text. The solid lines with error-bars show the signal stitched 
to an NFW profile with C2006 = 5 for r < 0.2h —1 Mpc (to remove 
resolution effects) . Bottom: Ratio of the signal for the best-fitting 
NFW profile to the true simulation signal. 

profile with C2006 = 7 at l/i -1 Mpc, the signal on smaller 
scales is steeper. In the former case, this resolution correc- 
tion increases the mass by 1.5 per cent compared to the mass 
in the simulations; in the latter case, the correction is 6 per 
cent. 

In the bottom panel of Fig. 2] we compare AE(i?) from 
the simulations to that for the best-fitting NFW profile (de- 
termined by varying both M2oob and C200& and fitting using 
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0.2 < R < 2h~ 1 Mpc). As shown, for most of the scales of 
interest, the deviations are less than 5 per cent. We see that 
the NFW profile overestimates the signal on ~ 3-8 h~ x Mpc 
scales . This result is consistent with that from lClowe et all 
(|2004l . who also find that on large scales the density profiles 
fall off faster than NFW. The effect is more significant when 
expressed in terms of the density profile p(r) . On the largest 
scales shown here, as R approaches fO/i -1 Mpc, the NFW 
profile signal starts to be too low, because the simulation 
includes contributions from LSS (again, this effect is more 
pronounced in p(r) and appears at lower radii). 

For the subsections that follow, we have added realistic 
levels of shape noise to the signal, based on calculations of 
the lensing signal using the maxBCG cluster catalogue with 
similar number density samples. 

5.2.1 Parametric modeling 

We begin by showing the effects of parametric modeling of 
the lensing profiles from simulations. We use the three afore- 
mentioned mass threshold samples, with the two methods of 
connecting to NFW profiles (Section [3]) to correct for reso- 
lution effects: C2006 = 5 at r — 0.2/i _1 Mpc, and C2oob = 7 
at r = f h" 1 Mpc. We then fit to AE(i?) and T(R; R ) with 
Ro = 0.25 and 5/i _1 Mpc, with varying -Rmin and -Rmax, 
for our two extreme concentration values of C200& = 4 and 
C2006 = 7. The fitting procedure is the same as for the an- 
alytic profiles in Section 15.1.11 Fig. [5] shows the results of 
these fits for the highest and lowest of the mass threshold 
samples. 

The important point to consider in this plot is that we 
would like the output mass from a given estimator to be rela- 
tively insensitive to the form of the inner profile (represented 
by the two different connections to NFW profiles on small 
scales) and to the assumed concentration. Furthermore, we 
would like it to be only weakly dependent on the mass, as- 
suming that corrections for systematic bias will be derived 
from simulations, but that strong mass dependence may be 
difficult to calibrate out correctly. Consequently, what we 
hope to see in an optimal estimator of cluster mass is that 
all the lines on a given panel (representing the results with 
different input profiles, assumed concentrations, and masses) 
give very similar results; we do not want to use an estimator 
that has large scatter between the lines. So, for example, the 
lower left panel shows, as we already saw with pure NFW 
profiles in Section \5J\ that fitting AE(.R) to NFW profiles 
with .R m ax = l/i" 1 Mpc and a fixed concentration leads to 
very large systematic uncertainties, more than a factor of 2 
total range in the best-fit masses. As we increase -Rmax, we 
become less sensitive to the inner details of the profile, so 
the scatter between the lines becomes less significant, but for 
Rmin ^ ih~ Mpc they still cover a range of ~ 40 per cent 
in mass even for -Rmax = 4/i _1 Mpc, well outside the virial 
radius. For -R m in = 2 and -Rmax = 4ft _1 Mpc, the systematic 
uncertainty is only ~ 10 per cent; however, the statistical 
error on the mass (not shown on this plot) has roughly dou- 
bled relative to the results with -Rmin < 0.5/i," 1 Mpc. 

In contrast, we see that T(R;Ro = 0.25 and 
0.5/i -1 Mpc), in the right panels in Fig. [5] performs quite 
well. The difference between the two mass threshold samples 
suggests that a larger Ro ~ -Rmin is preferable for samples 
with larger halo masses, with minimal profile-related sys- 



tematics for Ro — 0.5h 1 Mpc for the sample with a mass 
above 7x 1O 14 /i _1 M , and -Ro = 0.25h _1 Mpc for the sample 
with a mass around 1.6 x 10 14 /i _1 Mq (and therefore smaller 
scale and virial radii). While the cluster mass is not known 
a priori, a preliminary fit with one choice of Ro could be 
used to estimate an approximate mass, and then a new Ro 
could be chosen to be around 1/4 to 1/5 of the virial radius, 
provided that this scale is reliable from the perspective of 
small-scale systematics (Section 12.30 . 

In all cases, T(R; Ro) does not converge to the true 
mean mass, for two reasons: (1) the lensing signal includes 
a small but non-negligible contribution due to large-scale 
structure on the scales we have used, leading to an over- 
estimation of M2oob,cst; and (2), even on scales where LSS 
is not important, the simulation profiles fall off faster than 
the NFW model, which somewhat counteracts the previous 
effect. Fortunately, since it is relatively insensitive to the 
inner details of the profile, the assumed concentration, and 
the mass, this systematic positive bias in the masses can be 
calibrated out using simulations, whereas systematic uncer- 
tainty in AE(.R)-based mass estimates due to concentration 
assumptions and small-scale effects cannot be calibrated out 
in this way. 

Some differences in these results from Section 15.11 can 
be attributed to the LSS in the simulations that was not 
put into the pure NFW profiles, and to the fact that the 
simulation profiles are not strictly NFW profiles. So, for ex- 
ample, in Fig. [21 the results for fitting to AE(-R) converge 
to the true mass on large scales if the right concentration is 
assumed, whereas the fitting to AE(-R) in simulations con- 
verges to a mass that is too high by 5 to 10 per cent when 
using the largest scales only. 

As in Section T5. 11 we point out that for a stacked cluster 
sample, the level of variation we have allowed in the assumed 
C2006 is likely excessive from the standpoint of A^-body sim- 
ulations. However, given the systematic profile changes that 
may occur due to baryonic effects, centroiding errors, and 
intrinsic alignments, the variation we have assumed is not 
entirely unreasonable. For fits to individual cluster lensing 
data, the variation we have assumed is quite reasonable, and 
possibly even an underestimate of the true variation, given 
the large lognormal scatter in cluster concentrations in N- 
body simulation plus these other systematics that change 
the profile on small scales. 

We also estimate the effects of centroiding errors on the 
parametric mass recovery. As for the theoret ical profiles, we 
use th e model for centroiding errors given in I Johnston et all 
l|2007l ). with offset fractions of 20 and 25 per cent for the 
lower and higher abundance thresholds, respectively. 

Here we describe how centroiding errors modify the 
curves that were shown in Fig.[S] As we have seen before, the 
offsets suppress masses estimated directly from AE(i2), with 
larger biases when restricting to smaller scales. Furthermore, 
the profiles with more mass in the inner regions are more 
strongly affected. For example, the simulation signal stitched 
to NFW with C2006 = 7 at l/i -1 Mpc is more strongly af- 
fected than the signal stitched to C200& = 5 at 0.2k' 1 Mpc. 
Given that the former resulted in mass estimates that were 
above the masses estimated from the latter when fitting to 
AE(.R) (without offsets, Fig. O by up to tens of per cent 
depending on the value of -R max , the net effect of offsets is 
to lower all estimated masses while also reducing the differ- 
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Fits to signal from simulations 
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Figure 5. Results for M2006 est/A^2006 true a function of the minimum fit radius, H m i n , from parametric fits to the lensing signal from 
simulations, for seven different combinations of observable (AS(iJ) or T(R; Ro)) and i? max shown separately in each panel. As indicated 
in the legend, line colours and types are used to indicate the mass scale, whereas point styles are used to indicate the input signal (which 
NFW profile was used to correct for resolution effects) and assumed concentration in the fits, either C2oob = 4 or 7. The horizontal dotted 
line on each panel shows the ideal unbiased result. The vertical axis is the same for all panels in the left column, and spans a smaller 
range for all panels in the right column so that the details will be more visible. 



ence between the curves, since those with the two stitched 
profiles now tend to agree more closely. For example, when 
using i? max = l/i _1 Mpc, the values of Maoo&.ert/Maoo&.true 
without including centroiding errors in the modeling range 
from 0.6 to 1.9 (factor of three). Centroiding errors in the 
input data reduce the range of M200&, est /M 2 oob, true to 0.4 to 

0. 9 (factor of two), where the main cause of this variation 
is the assumed value of C2oob rather than the input profile. 
For R min = 0.5 and i? max = Ah' 1 Mpc, M 2 ooi>,est/M 2 oo{.,true 
ranges from 0.9 to 1.25 when we do not include centroiding 
errors, whereas when we include them, it ranges from 0.8 to 

1. As we have seen before when using pure NFW profiles, 
T(R;Ro) with i? m i n = Ro = 0.5/i _1 Mpc is almost com- 
pletely insensitive to this model for centroiding errors when 
using i? max = Ah" 1 Mpc (masses are suppressed at the 10 
per cent level with 7? ma x = 2/i -1 Mpc). This insensitivity to 
such systematics makes the ADSD statistic T(R; Ro) the op- 
timum choice for parametric mass fitting on stacked clusters 
selected from imaging data, which is prone to centroiding er- 
rors of this variety. 

In principle, explicit modeling of the offset distribution, 



as in I Johnston et all (|2007| ). can remove its effects when fit- 
ting to AE(i?). However, the exact results may be sensitive 
to the details of the centroiding model used and its accu- 
racy when compared to the true distribution, which is not 
typically well known. For example, that paper uses mock 
simulations to estimate the centroiding error distribution, 
which means that this model is quite sensitive to the real- 
ism of the model for populating the simulation dark matter 
halos with galaxies. Furthermore, the other systematic un- 
certainties associated with using AE(i?) (e.g., sensitivity to 
baryonic effects and intrinsic alignments) remain, whereas 
their influence on T(R;Ro) is much smaller. 

Another issue we consider is the effect of overall lensing 
signal calibration biases on the estimated masses. As a test, 
we use the signals from simulations multiplied by factors of 
0.9 and 1.1, and refit for the masses. The results are used 
to estimate a power-law relation M2oob oc AE*', and r\ is 
determined for the different mass scales, stitched signals, 
assumed concentrations, fit method (AE(-R) or T(R;Ro)), 
and minimum and maximum fit radii. Note that 77 is also 
dependent on the spherical over-density used to define the 
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Figure 6. Results for rj, the scaling of the estimated M200& with the lensing signal calibration, as a function of the minimum fit radius, 
Rmin, from parametric fits to the lensing signal from simulations, for five different combinations of observable (AE(il) or T(R; Ro)) and 
K m „ shown separately in each panel. We only show r) for the highest and lowest mass threshold sample. 



profile, though we do not explore this effect in detail. A naive 
scaling of surface mass density with mass predicts 77 = 1.5, 
but other effects will modify this. The results of this test are 
shown in Fig. [6] 

As shown, r) is a decreasing function of R m in and -Rmax- 
When fitting to AE(i2), r\ does not depend on the details of 
the inner profile, and is larger for higher masses and lower 
assumed C200&, with the dependence on C2oof> being the more 
significant dependence. For example, when fitting to AE(R) 
for the lower mass sample from simulations stitched to an 
NFW profile with c 20 ob = 5 at 0.2ft" 1 Mpc, using 0.5 < R < 
4ft" 1 Mpc for the fits, we find that A/ 2 oot oc AS 1 ' 42 . In con- 
trast, when fitting to T(R; Ro) with Ro = 0.5ft -1 Mpc, the 
trends in the fitting mass with calibration are stronger for a 
given combination of (R m in, flmsx). Here, we see that there 
is minimal dependence on mass, and some small dependence 
on the details of the profile and the assumed concentration. 
For the same case considered when fitting to AE(R), we find 
rj — 1.75, an increase of 23 per cent. Consequently, system- 
atic errors in T(R; Ro) due to miscalibration of the lensing 
signal are larger than systematic errors in AE(i?) (assum- 
ing that other aspects of the fit, such as i? m i n and -Rmax, are 
similar) . 

Next, we briefly discuss the effects of allowing both C2oof> 



and M 2oob to vary, rather than fixing C200&, as in lOkabe et al.l 
(2009). While this procedure has the disadvantage of in- 
creasing the statistical errors on the mass, it does allow for 
improved mass recovery. Our results suggest that with NFW 
fits to AE(ii) with 0.5 < R < 4ft" 1 Mpc, the degeneracy 
between M2006 and C2oofc is such that A-'hoot oc c 200b . This 
result explains the magnitude of the deviations from the 
true mass when the concentration is fixed to a value that is 
not consistent with the best-fitting concentration (though, 
again, the deviations in concentration we have tested are 
not sufficiently bad that the fit \ 2 values reveal a clear dis- 
crepancy). In contrast, the exponent on that scaling between 
M2006 and C2006 is far closer to zero when fitting to T(R; Ro) 
with Ro — 0.5ft _1 Mpc using the same scales: M2006 oc c^ool- 
This degeneracy becomes more striking when the fits are re- 
stricted to smaller scales, e.g., M200& oc c^ 00b when fitting to 
AE(7?) using 0.1 < R < lft" 1 Mpc. 

When fitting the simulation signals with both M2oob 
and C2006 as free parameters, we find that even when cen- 
troiding errors are included in the data, the fits are able to 
recover the masses for both mass scales and inner profiles, for 
several types of fits that we attempted (using AE(i?) from 
0.5 < R < Ah' 1 Mpc, from 0.1 < R < 1ft" 1 Mpc, and using 
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T(R;R ) with Ro = 0.5 and 0.5 < 7? < Ah" 1 Mpc). When 
using data from 0.5 < R < 4/i _1 Mpc, the deviations of the 
signal in simulations from an NFW profile led to best-fitting 
masses that are 5 per cent higher than the true masses; when 
fitting from 0.1 < R < lh^ 1 Mpc, the best-fitting masses 
are only ~ 2 per cent higher than the true ones, because the 
deviations from NFW are not as striking on those scales, 
and the large-scale structure term is also more negligible. 
The mass estimates tend to be noisier in this case, and the 
concentrations that are recovered are highly suppressed rel- 
ative to the true concentrations when centroiding errors are 
included (e.g., from best-fitting C2006 ~ 5 and ~ 6.5 with- 
out centroiding errors, down to 3 and 3.5 with centroiding 
errors). We find that these two-parameter fits for mass and 
concentration lead to statistical errors in the masses that 
are larger than the errors in one-parameter fits by approxi- 
mately 45 per cent. This increase is larger than the increase 
when fitting to T(R; Ro) (14 or 32 per cent, for Ro — 0.25 or 
0.5/i _1 Mpc), and T(7?; 7?o) has the additional advantage of 
removing the impact of small-scale systematics, which would 
still be present when fitting AE(i?) to the two-parameter 
model. 

Finally, when fitting with free M2006 and C200f>, the de- 
pendence of the best-fitting masses on the lensing signal 
calibration is reduced. For example, when fitting to AE(ii) 
using 0.5 < R < 4/i _1 Mpc and fixed concentration, we had 
found previously that M2006 oc AE 1 ' 42 . When the concen- 
tration is allowed to vary, that exponent becomes r\ = 1.25. 
This change results from the fact that if the signal increases, 
the assumed mass and therefore r2oo& increases as well, so 
for a fixed scale radius determined from the data, the con- 
centration would naturally tend to increase. When fitting to 
T(R; Ro), T) is not affected by whether or not the concentra- 
tion is fixed. 

In summary, we have found that T(R; Ro) is the optimal 
statistic for parametric mass modeling given its insensitiv- 
ity to the profile at small scales, with Ro = 0.25-0.5/i -1 Mpc 
for the cluster masses used here, giving a reasonable compro- 
mise in reducing systematic error while retaining reasonable 
S/N on the recovered masses for the case discussed here, but 
in general the choice of Ro will depend on the specific appli- 
cation one has in mind and on the scales to which the data 
can be considered relatively systematics-free. This statistic 
tends to slightly overestimate the mass due to the combi- 
nation of two competing effects: the profile deviation from 
NFW on large scales, and the neglected large-scale structure 
contribution to the lensing signal. However, these effects are 
only very weakly dependent on the details of the profile, 
the mass, and the cosmology, making them easy to calibrate 
out at the few per cent level using iV-body simulations. This 
result is in stark contrast to the effect of small-scale system- 
atics on the masses estimated from AE(7?) (e.g., varying 
concentrations, and deviations from NFW due to intrinsic 
alignments and baryonic effects), which lead to larger sys- 
tematic uncertainties in the recovered masses. These conclu- 
sions hold in cases where the NFW concentration is fixed. 
If it is allowed to vary, then the statistical errors will in- 
crease more than when using T(R;Ro) with a reasonable 
Ro, but systematic errors decrease, provided that the sys- 
tematic errors in the lensing signal appear reasonably similar 
to a change in NFW concentration, which is not the case for 
several of the small-scale systematics in Section \2. 31 



5.3 Example application with data 

Here we consider the maxBCG cluster lensing data in six 
scaled r ichness bins (12 N200 ^ 7 9), which was previously 
used in iMandelbaum et al.l |2008al ) for joint estimation of 
the concentration-mass relation and the mass-richness rela- 
tion. Here, we use several examples of fixed concentration- 
mass relations and several of the fitting methods considered 
in the previous sections to estimate the mass-richness rela- 
tion, always with Q. m = 0.25. This estimation as follows: 

• We generate 200 bootstrap-resampled datasets to esti- 
mate the noise in the data. For this bootstrap procedure, the 
data are divided into 200 regions on the sky which are boot- 
strapped (rather than bootstrapping th e individual lenses). 
More d etails on this procedure is given in lMandelbaum et al.l 
|2008al ). 

• For each dataset, we separately fit the data in each 
richness bin for M2006 assuming some C2oob{M2oob) rela- 
tion and fit method, for each richness bin. The choice of 
fit method includes specifying the statistic to fit and the 
range of transverse separations to use. Thus, given logarith- 
mic bins in transverse separation denoted i (Ri), dataset j, 
richness bin k, statistic for a given fit method I (denoted S 
for S = AE or T(R;Ro)), and C200b(A^200t) relation m, we 
use the Levenberg-Marquardt algorithm to separately min- 
imize the j x k x £ x m values of % 2 defined as follows: 



2 



E 



/. — i(data) 



(Ri 



. — 1 (model) 



a 2 (E ke (Ri)) 



(21) 



where we use i such that ii m i ftj i ^ Ri ^ Rmax,l- The result 
of this procedure is a matrix with j x k x I x m values of 
Af2006,20, where in practice we use j = 200, k = 6, £ = 3, and 
m — 3. The fit methods and concentration-mass relations 
are described in detail below. 

• The set of k M2006 values for a given dataset (j), fit 
method (I), and concentration- mass relation (m) are used 
to fit for a power-law relation between scaled richness and 
halo mass: 

M 2oob del) = [(M200M0 x 10 14 ) h-'Me] (22) 

This fit has two parameters: an amplitude M2oo6,20 that is 
the mass at our pivot richness of N200 = 20 in units of 
1O 14 /i -1 M0, and an exponent 7. We find the best-fitting 
values of M2oo6,20 and 7 for each (j, £, m) by minimizing 
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/Cj£m 
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(23) 



2006,20 



and 



The result is a matrix with j x £ x m values of M : 
7- 

• We use the list of j power-law fits for each bootstrap- 
resampled dataset to estimate the mean and variance of 
Af2006,20 and 7 for a given combination of fit method £ and 
concentration-mass relation m. 

We include m = 3 concentration-mass relations in our 
tests: a power law with 

-0.1 



C2006 



Al 



2006 



io i4 /i-!M y ^ 

consistent with IMandelbaum et alj l|2008al 'l: a constant 
C2006 = 4; and a constant C2006 = 7. We examine the results 
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for I — 3 fit methods: an extreme one assuming a small aper- 
ture for the cluster data, using AS(_R) from O.l-l/i -1 Mpc; 
using AE(i?) from 0.5-4ft _1 Mpc; and using T(R-R ) with 
Ro = 0.5/i _1 Mpc from 0.5-4/i _1 Mpc, given its good perfor- 
mance on theoretical profiles and simulations in the previous 
sections. We consider the fit results first without and then 
with correction factors derived from Fig. [5] While we only 
have simulation-based correction factors for samples with 
three mean masses (which are threshold samples, not dis- 
crete mass bins as in this example) and two concentrations, 
we interpolate those results to derive approximate correc- 
tions for all the fits done in this section. 

The final type of correction that we apply is a cali- 
brati on factor that redu c es the lensin g signal cali b ration 
from iMandelbaum et all (|2008al ) and feeves et al.1 (120081 1 
by 6 per cent. The reason for this correction is that for 
30 per cent of the spectro scopic training set presented in 
IMandelbaum et alJ (j2008bT ) for calibration of photometric 
redshifts that are used to estimate the lensing signal, an in- 
correct photometric calibration was used when computing 
the photometric redshifts. We emphasise that this incorrect 
calibration was only used for the kphotoz photometric red- 
shifts, not for any other photometric redshift sample, and 
thus the lensing signal calibrations that are quoted for other 
photometric redshift methods in that paper are correct. As 
a result of this error, the c alibrations from kphotoz w hich 
were used f o r the data in IMandelbaum et al.l (|2008ah and 
iReves et al . (2008) that we analyse here were 6 per cent 
too high, so we now apply a correction to the signal. We 
then present the results for the best-fitting masses after ap- 
plication of both the signal calibration correction and the 
simulation-based correction factors due to the mass estima- 
tion method. 

Fig. [7]shows the observed signal for the lowest and high- 
est richness bins for 0.1 < R < 4h~ Mpc, and the theoret- 
ical signal from the fits. This theoretical signal is derived 
by taking the best-fitting mass-richness relation, evaluating 
it at the mean richness of the bins that are shown, and us- 
ing the resulting mass and assumed concentration to define 
the theoretical signal. The fits did not only use the data 
shown on the plot, because the requirement of a power-law 
mass-richness relation means that the theoretical signal at 
the richness bins shown was also influenced by the data in 
all other bins. 

For reference, given a best-fitting mass-richness relation 
from Eq. (|22|l with M200M0 — 1-55 and 7 = 1.15 (which is 
a typical value given the scatter between the results in Ta- 
ble [4]), the combination with Eq. (|24[) gives a concentration- 
richness relation of 

C2006 =4.78(^)" am (25) 

Thus, within our richness range of 12 A^oo ^ 79, the 
concentrations vary from 5 to 4 as we move from the lowest 
to the highest richnesses. When we instead fix C2ooi> = 4 
independent of mass, we lower the concentrations at the low 
A^oo end of the sample by 20 per cent, without changing 
the concentrations at the very high mass end. When we fix 
C2006 = 7, then we raise all the concentrations by a very 
significant amount, from ~ 40 per cent increases at the low 
mass end to 75 per cent at the high mass end. The results for 



the three concentration-mass relations and fitting methods 
are given in Table [4] 

We begin by discussing the first fit method, using 
AE(R) from 0.1 < R < lh' 1 Mpc. As we have noticed in 
previous examples with the theoretical profiles and simula- 
tions, the results using these scales are highly sensitive to the 
assumed concentration-mass relation. We see that changing 
the assumed concentration among our three options leads to 
50 per cent variation of the amplitude Af2006,20, significantly 
larger than the statistical errors on this parameter, when we 
do not impose corrections from the simulations. The expo- 
nent 7 undergoes 20 per cent changes, which are roughly 
consistent with the size of the la statistical errors. The 
changes in this exponent can be easily understood as follows. 
First, if we change from the power-law concentration-mass 
relation in Eq. (|24[>. to fixed C2006 = 4, then we are lowering 
the assumed concentration for all but the highest mass halos. 
This means that, due to the typical concentration-mass anti- 
correlation when fitting AE, the best-fitting masses should 
increase at the lower mass end. As a result, the best-fitting 
mass-richness relation becomes less steep. When we change 
to use a higher concentration C2006 = 7, then due to this 
concentration-mass anti-correlation, the best-fitting masses 
are significantly suppressed (which explains the large change 
in M2ooi,,2o). Furthermore, this suppression is stronger at the 
higher mass end, where the difference between C2006 = 7 and 
Eq. (|24|l is most pronounced. This trend will tend to sup- 
press 7, as is seen in the table. 

When we impose corrections from the simulations to the 
results from the first fit method, we find that the variation in 
Af2006,20 and 7 when we change the assumed concentration 
is significantly reduced. However, there is still 30 per cent 
level variation, which may be ascribed to the fact that the 
scales that are used in this fit are quite prone to systematics 
such as intrinsic alignments and centroiding errors, which 
will affect the fits with different assumed concentrations in 
different ways. The simulation corrections can only correct 
for the fitting methods' different responses to a theoretical 
cluster lensing profile, not for their different responses to 
additional systematics that may be present in the data. 

When we fit using AE(i?) from 0.5 < R < Ah' 1 Mpc, 
we find smaller variations in the (uncorrected) amplitude 
A/2006, 20 of the mass-richness relation when we change the 
concentration-mass relation, at most 13 per cent, which is 
still problematic since it is close to twice the lcr statisti- 
cal error. (However, note that the fit \ 2 are not sufficiently 
different to rule out any of these three models; the lensing 
data only weakly constrain the concentration.) The trends 
in 7 with C200&(A/200i)) have the same sign as when fitting 
using AE(i?) from 0.1 < R < l/i" 1 Mpc, but are less pro- 
nounced (11 per cent variation, slightly smaller than the la 
statistical error). Because of the longer range in transverse 
separation, the statistical errors on the fit parameters have 
become smaller, though we do not fully benefit from this 
fact due to the systematic uncertainties. We also note that 
for a given concentration-mass relation, such as Eq. (|24l) . 
the amplitude A/2006,20 is increased by 4 per cent relative to 
the previous results. This increase may be due to system- 
atics that decrease the signal on scales below 0.5/i -1 Mpc, 
such as intrinsic alignments or centroiding errors. The fact 
that 7 has decreased relative to the 0.1 < R < 1ft -1 Mpc 
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Figure 7. Observed lensing signal from [Mandelbau m et al, | 1 1200831 1 for stacked maxBCG clusters, presented as AS(i?) and T(R;Rq) 
with Ro = 0.5/i _1 Mpc in the top and bottom panels, respectively. We show the lowest (left) and highest (right) richness bins out of the 
six used for the analysis. In addition to the data with bootstrap error-bars, we also show four theoretical signals labelled on the plot. 
Two of them were derived by fitting AE(_R) using -R m i n = 0.1 and i?, ma x = 1 Mpc with different assumed concentrations; the other 
two, by fitting T(iJ; Ho) with Ro = iJ m j n = 0.5 and Hmax = 4h -1 Mpc. The 6 per cent calibration correction described in the text has 
been applied. Because we required a power-law relationship between mass and richness, the best-fitting signals shown for these two bins 
were influenced by the data in the other richness bins (not shown). 



Table 4. Results of power-law fits for a mass-richness relation using stacked maxBCG cluster lensing data, using three fit methods and 
three concentration-mass relations. First present the best-fitting masses; then, include corrections for the bias on the mass estimation 
from simulations (Fig. O; finally, with both the simulation corrections and a 6 per cent decrease of the amplitude on the lensing signal, 
as described in the text. 



Fit method 


c 2006(-^200t) 


A^2006,20 


7 


-^200b,20 


7 


A^200b,20 


7 








No correction 


Simulation correction 


Sim. and photo 


-z corrections 


AS(R), 




Eq. |24j) 


1.64 ± 0.20 


1.24 ±0.35 


1.31 


1.10 


1.19 ±0.10 


1.10 ±0.28 


O.K R < lh- 1 


Mpc 


C200b = 4 


1.58 ± 0.15 


1.07 ±0.26 


1.14 


1.01 


1.04 ±0.09 


1.01 ±0.24 






c 200b = 7 


1.01 ± 0.08 


0.93 ±0.22 


1.16 


0.98 


1.06 ±0.09 


0.98 ±0.24 


AE(R), 




Eq. (SJj 


1.72 ± 0.13 


1.18 ±0.18 


1.56 


1.14 


1.44 ±0.10 


1.14 ±0.17 


0.5 < R < Ah- 1 


Mpc 


C2006 = 4 


1.70 ±0.12 


1.14 ±0.16 


1.51 


1.11 


1.40 ±0.10 


1.10 ±0.16 






c 2006 = 7 


1.52 ± 0.10 


1.06 ±0.15 


1.46 


1.11 


1.35 ±0.09 


1.11 ±0.16 


T(R;Ro), 




Eq. {24} 


1.79 ±0.18 


1.20 ±0.24 


1.67 


1.20 


1.50 ±0.16 


1.21 ±0.24 


Ro = 0.5k- 1 Mpc, 


C2006 = 4 


1.81 ± 0.18 


1.18 ±0.23 


1.75 


1.16 


1.56 ±0.16 


1.17 ±0.23 


0.5 < R < Ah,- 1 


Mpc 


c 200i> = 7 


2.02 ± 0.19 


1.11 ±0.21 


1.73 


1.17 


1.56 ±0.16 


1.17 ±0.23 
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results suggests that the change in masses is more significant 
at lower richness than at higher richness. 

When we impose corrections from the simulations in 
Fig. [5] to the results of this second fit, we find that the total 
range of A/2006,20 and 7 values is quite small, roughly 7 and 3 
per cent respectively. This finding is encouraging: it suggests 
that we may be converging to a result that is more robust 
to small-scale systematics. Since the typical corrected mass 
from this fit method is 25 per cent higher than that for the 
fits using 0.1 < R < lh' 1 Mpc, we conclude that the fits 
that use those smaller scales may be significantly influenced 
by small-scale systematics. 

Finally, we consider the results of fits to T(R;Ro) with 
Ro = 0.5 using 0.5 < R < Ah' 1 Mpc. First, we see that 
the statistical errors on fit parameters are larger than when 
using AS (J?) for the same scales (50 per cent larger, compa- 
rable to or smaller than the errors when using AE(i?) from 
0.1 < R < l/i -1 Mpc). This trend in the errors may seem 
inconsistent with the results in the simulations, which sug- 
gested ~ 30 per cent increase in mass estimation statistical 
errors. However, the 50 per cent increase is for the power- 
law amplitude that comes from using 6 mass bins. On each 
individual mass bin, the mass uncertainties increase by 30 
per cent when using T(R;Ro) with Ro — i? m i n = 0.5 and 
R max = 4k' 1 Mpc relative to AE(ii) with i? mln = 0.5 and 
RmayL = 4ft" 1 Mpc. Thus, the mass increase we see here for 
individual mass bins is consistent with that in the simula- 
tion. Second, the variation in the uncorrected A/2006,20 when 
we change the concentration-mass relation is 11 per cent, 
comparable to the la errors, though we emphasise again 
that the variations in concentration that we have allowed 
are relatively extreme compared to what is seen in simula- 
tions. The variation in 7 is 7 per cent, more than a factor 
of two smaller than the statistical error. The sense of the 
change in A/2006,20 when changing C2oo&(Af2006) i s tne oppo- 
site as when fitting to AE(i£), as we have seen before in the 
simulations. 

When we use the simulation results to correct these fi- 
nal fits that use T(R;Ro), we see that the corrections again 
reduce the spread in the best-fitting M2oo6,20 and 7 val- 
ues when we use different concentration-mass relations. The 
residual 4 per cent variation in both fit parameters is well 
below the statistical error. We note that the typical mass 
A/2006,20 at richness N200 = 20 has increased by 10 per cent 
relative to the fits using AE(i?) on the same exact scales, 
even after the imposition of the correction from simulations 
in Fig. [5] We suggest that this change may result from low- 
level residual contamination of AE(/Z) due to systematics 
such as centroiding errors even for R > i? m i n = 0.5/i _1 Mpc. 
Such contamination can, as we have shown, bias fits to 
AE(i?) while not affecting fits to T(R;Ro). Thus, we adopt 
our mass normalisation at the pivot richness -/V200 = 20 as 
M 2 oo6,2o/(10 14 /i" 1 M Q ) = 1.54 ± 0.16 (stat.) ± 0.06 (sys.), 
the mean of the values from the fits to T(R;Ro) with the 
different concentration-mass relations. This systematic error 
results from an uncertainty of 0.03 due to uncertainties in 
the mass estimation due to both the assumed and true pro- 
file, added in quadrature with the lensing signal calibration 
uncertainty of 0.05. 

We now compare these results against the M 2 oob(A r 20o) 
relations determined in sever al previous papers. F irst, w e 
compare against that from iMandelbaum et al.l (|2008al ). 



which used these data to fit for a concentration-mass 
and mass-richness relation. Given that the best-fitting 
concentration-mass relation in that paper was quite similar 
to our Eq. (|24[) . and that the fits in that paper used AE(ii) 
from 0.5 < R < 3h~ Mpc, we expect quite similar results 
to the results in this paper using C2oob{M 2 oob) from Eq. (|2~4l 
and AE(R) from 0.5 < R < 4/T 1 Mpc. In that paper, we 
found M 2 oo6,20 — 1-55 and 7 = 1.14. The mass normalisa- 
tion is quite simila r to wh at we quote here, because (a) in 
IMandelbaum et al l (|2008ah the masses were reduced by ap- 
proximately 10 per cent due to small-scale systematics (from 
the use of AE(i?) rather than T(R; Ro)), but (b) the lensing 
signal amplitude was too high by 6 per cent, as explained 
above, which raised the best-fit mass by 1.06 , a 9 per cent 
difference. 

iReves et al.l (|2008T l used the maxBCG cluster lensing 
data to estimate a mass-richness relation. That work used 
fits to AE(i?) from 0.5-4/i _1 Mpc assuming Eq. for 
the concentration-mass relation, with the same source shape 
measurements, shear calibration, and source redshift distri- 
bution calibration as in this paper. However, the richness 
range used in that paper was slightly different, since it used 
the entire public catalogue from the minimum N200 = 10 to 
the maximum scaled richness. Furthermore, the binning into 
richness bins within the range that is s hared by this work 
and tha t one was different. Finally, as for IMandelbaum et al.l 
(2008a), they explicitly modelled the halo-halo term using 
the same halo model formalism and assumed mass-bias rela- 
tion. Their result was a best-fitting mass-richness power-law 
with A/2006,20 — 1-42 and 7 = 1.16. Thus, the calibration is 
8 per cent lower than the value we have adopted here, but 
this could be attributed to differences in richness ranges. 

Finally, we compare against the fits to the maxBC G cat- 
alogue cluster lensing signal in IJohnston eiTaLl l|2007l '). The 
differences in procedure compared to this paper are numer- 
ous. First, the richness range is different, because they use 
a non-public version of the catalogue that extends down to 
iV200 = 3. They fit to AE(7?) using 0.05 sC R < 30k' 1 Mpc, 
and allow the halo concentration and the amplitude of the 
large-scale structure term to vary. They also use a model 
for BCG centroiding errors based on mock catalogues, and 
incorporate this model into their fitting routine to correct 
for the tendency of centroid errors to suppress the esti- 
mated masses. They explicitly include lognormal scatter on 
the mass-richness relation (with a strong prior in the fits). 
Finally, while they use the same galaxy shape measure- 
ments, they us e different photometric red shifts, which we 
have shown in IMandelbaum et al.l (|2008bT ) leads to a cali- 
bration bias in the lensing signal of -15 per cent. Since we 
have found that the fitted masses when assuming an NFW 
profile scale like AE 1,4 , this bias in AE corresponds to a 20 
per cent suppression of the masses. Thus, while they find 
Af2006,20 ~ 1-2 and 7 = 1.3 (for a spherical over-density 
of 180p, which should only differ from our definition by 
several per cent), we compare against a corrected value of 
A/2006,20 — 1-5. This result is within a few per cent of our 
value of M2006.20 = 1-54 that we have adopted here. Given 
the different richness range (which also contributes to the 
different value of 7) and the many other differences in fit 
procedure, the three per cent discrepancy is not of concern, 
and is comparable to our quoted systematic uncertainty. 
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6 CONCLUSIONS 

In this paper, we have assessed the degree to which cer- 
tain systematic errors in lensing measurement and methods 
of mass estimation can bias weak lensing cluster mass esti- 
mates. In brief, the challenges we considered included the 
following. 

• Lensing calibration bias, which leads to changes in the 
mass oc AE* 7 for r\ typically in the range 1.4-2 depending 
on the radial range and fit method used for the paramet- 
ric NFW mass fits (lower for AE(i?) than for T(R;R )), 
or oc AE for the non-parametric mass estimates within a 
fixed physical aperture (or a steeper scaling when estimated 
the mass within some spherical over-density radius) using £ c 
(Section I2XT1 15X21 and EXT). 

• Offsets of the identified BCG from the minimum of the 
cluster potential well (Section 12. 2. 3[) were incorporated into 
the lensi ng profiles using a mod el from mock catalogues pre- 
sented in ljohnston et al] (|2007t ) . This model includes the ef- 
fects of photometric errors in selecting the wrong BCG, and 
is therefore an overly conservative estimate in cases where 
the BCG can be unambiguously identified for all clusters or 
where X-ray data can precisely locate the cluster centre. 

• The effect of differences between an assumed NFW con- 
centration and the true NFW concentration were studied 
using pure NFW lensing signals. 

• Differences in the halo profile relative to a pure NFW 
profile were studied using fits to density profiles from N- 
body simulations. 

• The effects of mass from structures other than the clus- 
ter itself on the lensing signal were also studied using the 
signal from simulations, since we have not included only the 
mass that is virialized when computing AE in the simula- 
tions. 

• Contamination of the source sample by cluster member 
galaxies, intrinsic alignments of those member galaxies, and 
baryonic effects on the halo density profile were considered 
to be included among the previous tests, namely changes in 
NFW concentration (in Section[5]T| , changes in the inner re- 
gion of the profile using variations of the iV-body simulation 
outputs (in Section l5"T2")) . and centroid offsets that modify the 
signal only in the inner regions of the cluster. 

When fitting a parametric model (in our case, the NFW 
profile) to AE(_R), with fixed concentration, we find that 
the uncertainties due to unknown true concentration plus 
changes in the lensing profile due to small-scale systemat- 
ics yield systematic errors that range from a factor of two 
in mass (when only using small scales in the fits, e.g. 0.1- 
lh^ 1 Mpc) to tens of per cent (when using R > 0.5k" 1 Mpc) 
to several per cent (for R > 2h~ l Mpc, which yields sta- 
ble mass estimates but large statistical errors, and which 
may not be available for individual cluster lensing analy- 
ses due to limited telescope FOV). This level of systematic 
error occurred when allowing a relatively broad variation 
in concentration (4 < C2006 < 7), given the disagreement 
between simulations on the concentration-mass relation at 
high masses, the large lognormal scatter in this relation, 
and other systematics such as baryonic effects discussed in 
Sections 12.21 12.31 and 12.61 When using a narrower range in 
concentration, the systematic errors decreased comparably, 



but are still unacceptably large relative to what is needed 
for precise cosmological parameter constraints. 

The addition of centroiding errors to the list of system- 
atics we considered led to uniform suppression of the mass 
estimates of order tens of per cent (for R m i n = 0.1/i _1 Mpc). 
To completely avoid this suppression while fitting to AE(i?) 
and ignoring the possibility of centroiding errors, we found 
it necessary to restrict the fits to i? m in > l/i -1 Mpc. Gener- 
ally, the addition of larger scales, out to ~ 2r2ooi>, is useful 
in minimising the effects of small-scale systematics; going 
beyond that can lead to excessive contribution from large- 
scale structure, which will bias the mass estimates if it is 
not modelled accurately. Allowing a variation in concentra- 
tion in the fits is another way to reduce systematic error, 
at the expense of statistical errors that are increased by 45 
per cent, but this scheme is not helpful when dealing with 
systematics that have a radial profile that does not mimic a 
change in concentration. T(_R; Ro) is still more reliable at re- 
moving the impact of small-scale systematics on the lensing 
signal. 

The aperture mass statistic £ c led to accurate estimates 
of projected masses, provided that either (a) the mass in 
the outer annulus was estimated rather than ignored, or (b) 
the mass in the outer annulus was ignored, but R i ^> -Ri 
(i.e. a large range of transverse separations was included in 
the first integral in Eq. (|12jl ). For many applications, such 
as the halo mass function, the quantity of interest is the 3d 
virial mass, for which a density profile must be assumed to 
do the conversion from the 2d projected mass within Ri. We 
found that uncertainty in the true density profile led to tens 
of per cent level biases in the 3d virial masses. The effect 
of centroiding errors was to uniformly suppress the aperture 
masses by ~ 10-20 per cent depending on the halo mass, de- 
gree of centroiding errors, and transverse separations used 
for the analysis; these biases were then propagated into the 
3d enclosed mass estimates. The aperture mass-based esti- 
mates of the cluster virial mass were substantially noisier 
than fits to AE(_R) using the same range of scales. 

Finally, the new statistic we introduce here, T(R;Ro), 
removes the effect of small scales from the lensing sig- 
nal, gave superior performance over AE(J?) when fitting an 
NFW profile to the cluster lensing signal. This statement is 
true not just for the basic tests with pure NFW profiles and 
profiles from simulations, but also when including the effects 
of centroiding errors. The increases in statistical error on the 
mass can be ~ 40 per cent relative to fitting to AE(i?) over 
the same scales. The residual systematic uncertainties after 
removal of an overall offset in the masses is of order several 
per cent, when fitting from 0.5 < R < Ah' 1 Mpc, as demon- 
strated using SDSS maxBCG data. The effects of T(R; Ro) 
in decreasing systematic error are less dramatic when only 
small scales (sj 2/i _1 Mpc) are used for the mass estimates; 
however, the residual systematics of order 10 per cent are 
still at least a factor of two smaller than when fitting to 
AE(.R). 

These conclusions also apply for individual cluster lens- 
ing analyses; however, we caution that in that case, we ex- 
pect additional uncertainties in the true halo profile due to 
contamination by cluster member galaxies, the lognormal 
scatter in concentration at fixed mass, mergers, substruc- 
ture, triaxiality, and projection effects (Section 12. 2l and 12. 3[) , 
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so the systematic errors will tend to be larger than for 
stacked analyses using the same mass estimation method. 

Next, we will briefly discuss the implications of 
our findings about mass estimation methods for several 
previously-publishe d cluster lensing studies. We begin with 
lOkabe et all 12009), which contains an analysis of circularly- 
averaged cluster lensing data for thirty individual clusters by 
comparison with spherical models. They begin with direct 
fitting of the tangential shear profile to parametric mod- 
els, including the NFW profile. These fits allow all model 
parameters to vary; for example, the NFW fits have two 
parameters, a mass and a concentration, unlike the cases 
we have considered here with a fixed concentration. Conse- 
quently, the estimated masses from the NFW model fits are 
unlikely to be strongly biased due to modeling assumptions, 
since the concentration is not fixed. However, they may still 
have some systematic bias due to the NFW profile not de- 
scribing cluster profiles well, due to deviations in individual 
cluster profiles due to substructure, mergers, and triaxiality, 
and possibly significant biases due to small-scale systemat- 
ics such as contamination by cluster member galaxies and 
centroiding errors. 

lOkabe et al.l (HH) also use the aperture mass statis- 
tic £ c to estimate M 2 d, while neglecting the second term 
in Eq. (|12|) and choosing the outer annulus such that it 
does not contain any significant structures. As we have 
seen here, the aperture mass statistic when including both 
terms properly leads to quite accurate projected mass es- 
timates, or can yield results that are accurate at the sev- 
eral per cent level even without the second term pro- 
vided that Ri <C Roi- Given the scales that are accessi- 
ble with the Subaru Suprime-Cam, and the typical clus- 
ter redshifts, we should compare against the top portion of 
Tabled the rows with (R 1 ,R i,R o2 ) = (0.275,1.1,2) and 
(0.5, 1.1, 2)h~ 1 Mpc. Those results suggest that for the most 
massive clusters, neglect of the second term may cause 15-20 
per cent suppression of the projected masses. We find that 
the suppression is reduced to 5-10 per cent for more typi- 
cal cluster masses of 1O 14 /i _1 M . Furthermore, as we have 
already seen, effects that suppress the signal in the inner 
cluster regions, such as centroiding errors and contamina- 
tion by cluster member galaxies, can suppress the aperture 
masses at the ~ 10 per cent level. 

iHoekstral (|2007l ) contains an analysis of cluster weak 
lensing data for twenty individual clusters. This work utilises 
parametric mass estimates from the tangential shear distor- 
tion averaged in annuli, fitting to an NFW profile with fixed 
concentration-mass relation from A-body simulations using 
0.25 < R < 1.5ft -1 Mpc. In this case, we can assess system- 
atic uncertainties as being somewhere between the results 
for (iCin, iimax) = (0.25, 1) and (0.25, 2)h~ 1 Mpc on Fig. 
That figure suggests that uncertainties due to differences 
between the assumed and the true profile lead to ~ 50 per 
cent variations in the estimated cluster halo masses. This 
variation may be manifested as significant noisiness in the 
mass estimates for a given true mass, as well as a mean bias 
if the true profiles (with the imposition of systematics such 
as contamination by cluster member galaxies) differ from 
the NFW profile with that assumed concentration-mass re- 
lation. This problem is in addition to other uncertainties in 
individual cluster mass estimates noted previously, such as 



LSS (for which they explicitly increase their error bars) and 
tria xiality. 

IHoekstral (|2007l ) also use the aperture mass statistic to 
estimate projected masses, M 2 &, while estimating the sec- 
ond term in Eq. ()12[) due to the outer annulus using the 
best-fitting NFW model. In that case, we note that while 
IHoekstral (20071 ) do not miss mass by excluding the second 
term in the aperture mass calculation, their conversion from 
M 2 &(< Ri) to virial radii using spherical overdensities that 
can define the ma ss function will b e strongly concentration- 
dependent. While IHoekstral (|2007r ) claim that the fact that 
the masses from the fits to AS(_R) and from the aperture 
mass calculation are consistent shows that their fitting pro- 
cedure is unbiased, as d iscussed in Secti on l5.1.2l this claim is 
not true. The fact that IVikhlinin et~aH (2006) use the clus- 
ter mass estimates from this work to calibrate their mass 
function constraints is therefore of concern, because of the 
possible biases due to these systematics in the signal and 
the large systematic scatter that we have found. 

In summary, we believe that weak lensing is the best 
observational technique to robustly estimate cluster virial 
masses (regardless of their dynamical state) at the level re- 
quired for precision cosmology. Given the small statistical 
errors of recent cluster abundance analyses, the cosmolog- 
ical constraints are already dominated by the systematic 
preci sion of the cluster mass determination (jVikhlinin et al.l 
2006). As we argue in this paper, current methods are inad- 
equate for this purpose because they rely on the information 
from the inner parts of the cluster, which can be contami- 
nated or modified due to a variety of effects discussed in this 
paper, and because they do not use numerical A-body sim- 
ulations to calibrate their results. Our results suggest elimi- 
nating lensing information from scales below _Ro (for which 
we suggest the range 0.2 < Ro < 0.5/i -1 Mpc or about 15-25 
per cent of the virial radius, as determined via an iterative 
procedure). Our proposed statistic for parametric estimates 
of cluster mass, the ADSD T(R;Ro), achieves this by con- 
struction, and is consequently more robust to many differ- 
ent systematics and to the details of the model to which the 
data are fitted, all of which are more problematic in the in- 
ner parts of the cluster. Use of T(R; Ro) to estimate cluster 
masses allows systematic errors to be reduced to the several 
per cent level, which is up to a factor of 10 smaller than 
when fitting to the lensing signal AE(_R) itself, suggesting 
that for current and future datasets, T(_R; Ro) should be the 
statistic of choice for parametric mass fitting to cluster weak 
lensing data. While we have focused on clusters in this pa- 
per, similar concerns about accurately determining the halo 
mass would arise also for smaller halos. For these, the stel- 
lar component from the galaxy (and possibly the associated 
redistribution of the dark matter) would modify the mass 
distribution relative to predictions from pure A-body sim- 
ulations in the inner parts, suggesting that eliminating the 
inner halo information by using T(R; Ro) could lead to more 
accurate mass determination of group and galaxy type halos 
as well. 
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